Contains the text characters and words found in a document page.
[SerializableAttribute()][DataContractAttribute()]public class DocumentPageText
<SerializableAttribute(),DataContractAttribute()>Public Class DocumentPageText
public [SerializableAttribute,DataContractAttribute]ref class DocumentPageText
public class DocumentPageText implements SerializableThe text of a document page can be read by using the DocumentPage.GetText method. The text characters found in the page will be set in the in Characters property of the returned DocumentPageText object.
The text words are created from the characters found in the document based on the IsEndOfWord returned by document parsing engine. Whenever an "end of word" is found, the last set of characters are grouped together and stored as an item in the Words list. This is not performed automatically, instead, you must call BuildWords to populate the Words list from the Characters.
The document page text can also be obtained as a simple string object through the Text property. This is not performed automatically and you must call BuildText to populate this property with the text value from Characters. Note that BuildText will also build the words by calls BuildWords first if this has not been done by the user first.
The FirstCharacterIndex and LastCharacterIndex of the DocumentWord object can be used to map the word back to the original characters in the Characters list. Similarly, you can use BuildTextWithMap to populate Text as well the TextMap list that can be used to map the text string all the way back to its part in the Characters list.
At any time, you can update the Characters list and call any of the methods above to re-generate Words, Text and TextMap. To clear the generated values, use ClearBuildData.
The text is parsed from the original document using either SVG or OCR technologies, for more information, refer to Parsing Text with the Documents Library.
using Leadtools;using Leadtools.Codecs;using Leadtools.Forms.DocumentWriters;using Leadtools.Svg;using Leadtools.Documents;using Leadtools.Caching;using Leadtools.Annotations.Core;using Leadtools.Forms.Ocr;using Leadtools.Barcode;public static void DocumentPageTextExample(){var options = new LoadDocumentOptions();using (var document = DocumentFactory.LoadFromFile(Path.Combine(ImagesPath.Path, "Leadtools.doc"), options)){// get textvar page = document.Pages[0];var pageText = page.GetText();if (pageText != null){pageText.BuildText();var text = pageText.Text;Console.WriteLine(text);}else{Console.WriteLine("Failed!");}}}
Imports LeadtoolsImports Leadtools.CodecsImports Leadtools.Forms.DocumentWritersImports Leadtools.SvgImports Leadtools.DocumentsImports Leadtools.CachingImports Leadtools.Annotations.CoreImports Leadtools.BarcodeImports Leadtools.Forms.OcrPublic Shared Sub DocumentPageTextExample()Dim options As New LoadDocumentOptions()Using document As Document = DocumentFactory.LoadFromFile(Path.Combine(ImagesPath.Path, "Leadtools.doc"), options)' get textDim page As Leadtools.Documents.DocumentPage = document.Pages(0)Dim pageText As DocumentPageText = page.GetText()If Not IsNothing(pageText) ThenpageText.BuildText()Dim text As String = pageText.TextConsole.WriteLine(text)ElseConsole.WriteLine("Failed!")End IfEnd UsingEnd Sub
|
Products |
Support |
Feedback: DocumentPageText Class - Leadtools.Documents |
Introduction |
Help Version 19.0.2017.3.22
|

Raster .NET | C API | C++ Class Library | JavaScript HTML5
Document .NET | C API | C++ Class Library | JavaScript HTML5
Medical .NET | C API | C++ Class Library | JavaScript HTML5
Medical Web Viewer .NET
Your email has been sent to support! Someone should be in touch! If your matter is urgent please come back into chat.
Chat Hours:
Monday - Friday, 8:30am to 6pm ET
Thank you for your feedback!
Please fill out the form again to start a new chat.
All agents are currently offline.
Chat Hours:
Monday - Friday
8:30AM - 6PM EST
To contact us please fill out this form and we will contact you via email.