Gets the text content of this page.
public DocumentPageText GetText()
public:
DocumentPageText^ GetText()
public DocumentPageText getText()
def GetText(self):
The text content of this DocumentPage as a DocumentPageText object.
GetText is used to parse the text content of a page. All document types support this method and internally will use the LEADTOOLS SVG or OCR engines to obtain the text from the document.
SetText is used to replace the text content of the page. IsTextModified is used to as flag that indicates that the text of this page has been replaced by the user.
This method works as follows (the "item" is the text content):
If an item was found in the cache, it is returned right away. This is available if the document was created using the cache system. and LEADDocument.CacheOptions contains DocumentCacheOptions.PageText.
The value of DocumentImages.IsSvgSupported and DocumentText.TextExtractionMode will determine if the text is parsed from the page using LEADTOOLS SVG or OCR engines.
If SVG is supported, then GetSvg is called and the text is parsed from the SVG content directly without the use of OCR.
Otherwise, and if OCR is supported, then GetImage is called and the text is parsed from the raster image using OCR.
If this document uses the cache system (LEADDocument.HasCache is true), then the DocumentPageText object is saved to the cache before it is returned. Next time this method is called, the text will be returned from the cache directly without parsing it again.
When the value of DocumentText.AutoParseLinks is true and the page text is first obtained using GetText, then the document will attempt to also parse the text for links based on the regular expressions stored in LinkPatterns.
Note that if SetText has been previously called with a null object for the text parameter, then this method will return null as well.
In all cases, the returned DocumentPageText object is not used by this LEADDocument after it has been returned.
The LEADTOOLS Document Viewer uses this method to obtain the text used with all text operations such Find, Select Text and Text Review annotations objects.
For more information, refer to Parsing Text with the Document Library.
using Leadtools;
using Leadtools.Codecs;
using Leadtools.Document.Writer;
using Leadtools.Document;
using Leadtools.Caching;
using Leadtools.Annotations.Engine;
using Leadtools.Ocr;
using Leadtools.Barcode;
using Leadtools.Document.Converter;
public void DocumentPageGetTextExample()
{
var options = new LoadDocumentOptions();
using (var document = DocumentFactory.LoadFromFile(Path.Combine(LEAD_VARS.ImagesDir, "Leadtools.tif"), options))
{
//for the TIF file we need an OCR engine
var ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD);
var rasterCodecs = new RasterCodecs();
var documentWriter = new DocumentWriter();
ocrEngine.Startup(rasterCodecs, documentWriter, null, LEAD_VARS.OcrLEADRuntimeDir);
document.Text.OcrEngine = ocrEngine;
// get text
var page = document.Pages[0];
var pageText = page.GetText();
if (pageText != null)
{
pageText.BuildText();
var text = pageText.Text;
Console.WriteLine(text);
}
else
{
Console.WriteLine("Failed!");
}
}
}
static class LEAD_VARS
{
public const string ImagesDir = @"C:\LEADTOOLS23\Resources\Images";
public const string OcrLEADRuntimeDir = @"C:\LEADTOOLS23\Bin\Common\OcrLEADRuntime";
}
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URI;
import java.net.URISyntaxException;
import java.net.URL;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.Calendar;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.regex.Pattern;
import org.junit.*;
import org.junit.runner.JUnitCore;
import org.junit.runner.Result;
import org.junit.runner.notification.Failure;
import static org.junit.Assert.*;
import leadtools.*;
import leadtools.annotations.engine.*;
import leadtools.barcode.*;
import leadtools.caching.*;
import leadtools.codecs.*;
import leadtools.document.*;
import leadtools.document.DocumentMimeTypes.UserGetDocumentStatusHandler;
import leadtools.document.converter.*;
import leadtools.document.writer.*;
import leadtools.ocr.*;
public void documentPageGetTextExample() {
final String LEAD_VARS_IMAGES_DIR = "C:\\LEADTOOLS23\\Resources\\Images";
final String OCR_LEAD_RUNTIME_DIR = "C:\\LEADTOOLS23\\Bin\\Common\\OcrLEADRuntime";
LoadDocumentOptions options = new LoadDocumentOptions();
LEADDocument document = DocumentFactory.loadFromFile(combine(LEAD_VARS_IMAGES_DIR, "ocr1.tif"), options);
// for the TIF file we need an OCR engine
OcrEngine ocrEngine = OcrEngineManager.createEngine(OcrEngineType.LEAD);
RasterCodecs rasterCodecs = new RasterCodecs();
DocumentWriter documentWriter = new DocumentWriter();
ocrEngine.startup(rasterCodecs, documentWriter, null, OCR_LEAD_RUNTIME_DIR);
document.getText().setOcrEngine(ocrEngine);
// get text
DocumentPage page = document.getPages().get(0);
DocumentPageText pageText = page.getText();
String text = "";
if (pageText != null) {
pageText.buildText();
text = pageText.getText();
System.out.println(text);
} else {
System.out.println("Failed!");
}
assertTrue(text != null && !text.equals(""));
}
Help Collections
Raster .NET | C API | C++ Class Library | HTML5 JavaScript
Document .NET | C API | C++ Class Library | HTML5 JavaScript
Medical .NET | C API | C++ Class Library | HTML5 JavaScript
Medical Web Viewer .NET
Multimedia
Direct Show .NET | C API | Filters
Media Foundation .NET | C API | Transforms
Supported Platforms
.NET, Java, Android, and iOS/macOS Assemblies
Imaging, Medical, and Document
C API/C++ Class Libraries
Imaging, Medical, and Document
HTML5 JavaScript Libraries
Imaging, Medical, and Document