Indicates how to treat the image elements encountered in the input SVG page during text extraction.
public enum DocumentTextImagesRecognitionMode
Public Enum DocumentTextImagesRecognitionMode
enum class DocumentTextImagesRecognitionMode sealed
|0||Auto||Use SVG engine unless the page is all raster.|
|1||Disabled||Do not use OCR recognition for the image elements. Instead, ignore the image elements.|
|2||Always||Use OCR recognition on the image elements. Add the recognition data to the final document page text with the rest of the other SVG elements of the page. Requires a valid IOcrEngine instance.|
Use DocumentTextImagesRecognitionMode to specify which DocumentText.ImagesRecognitionMode type to determine how image elements are treated during text extraction from an SVG page. This value has no effect on raster pages, and OCR is always used.
SVG elements can also contain glyph (paths) that may or may not be considered images and could also be recognized using the OCR engine. This is controlled by the DocumentText.RecognizeGlyphs property.
The following table helps determine what would occur during DocumentPage.GetText, depending on the type of the page:
|Auto||SVG with only text or mixed image and text elements||Only the text elements are extracted|
|Auto||SVG with raster elements only||The image elements are recognized and text extracted using the OCR engine|
|Disabled||SVG with only text or mixed image and text elements||Only the text elements are extracted|
|Disabled||SVG with raster elements only||No text is extracted|
|Always||SVG with only text or mixed image and text elements||The text elements are extracted and the image elements are recognized and text extracted using the OCR engine|
|Always||SVG with raster elements only||The image elements are recognized and text extracted using the OCR engine|
The engine will use DocumentPage.IsSvgSupported and DocumentPage.IsSvgConversionPreferred, as well as checking the SVG of the page elements (returned by DocumentPage.GetSvg) to perform the actions described above.
Note: When using the OcrEngineType.LEAD engine, DocumentPage.GetText will try to optimize the speed of OCR recognition for text format output (for instance, it will not try to recognize font decorations such as bold or italic). This is done by checking if
Recognition.AutoRecognizeManager.FormatSpeedOptimized is true (the default value). This optimization can result in DocumentPage.GetText producing slightly different recognition on complex input raster images than IOcrPage.GetText, which does not use the value of the setting. Therefore, if producing the same exact results from the two methods is important, set the value of the
Recognition.AutoRecognizeManager.FormatSpeedOptimized setting to false in the IOcrEngine used with the document. Refer to LEADTOOLS OCR Module - LEAD Engine Settings for more information.
Medical Web Viewer .NET
.NET, Java, Android, and iOS/macOS Assemblies
C API/C++ Class Libraries