←Select platform

DocumentTextImagesRecognitionMode Enumeration

Summary

Indicates how to treat the image elements encountered in the input SVG page during text extraction.

Syntax
C#
VB
C++
[SerializableAttribute()] 
[DataContractAttribute()] 
public enum DocumentTextImagesRecognitionMode 
<SerializableAttribute(),  
 DataContractAttribute()>  
Public Enum DocumentTextImagesRecognitionMode 
public: 
   [SerializableAttribute,  
   DataContractAttribute] 
   enum class DocumentTextImagesRecognitionMode sealed 

Members
Value Member Description
0 Auto Use SVG engine unless the page is all raster.
1 Disabled Do not use OCR recognition for the image elements. Instead, ignore the image elements.
2 Always Use OCR recognition on the image elements. Add the recognition data to the final document page text with the rest of the other SVG elements of the page. Requires a valid IOcrEngine instance.

Remarks

Use DocumentTextImagesRecognitionMode to specify which DocumentText.ImagesRecognitionMode type to determine how image elements are treated during text extraction from an SVG page. This value has no effect on raster pages, and OCR is always used.

SVG elements can also contain glyph (paths) that may or may not be considered images and could also be recognized using the OCR engine. This is controlled by the DocumentText.RecognizeGlyphs property.

The following table helps determine what would occur during DocumentPage.GetText, depending on the type of the page:

Value Page Type Behavior
Auto SVG with only text or mixed image and text elements Only the text elements are extracted
Auto SVG with raster elements only The image elements are recognized and text extracted using the OCR engine
Disabled SVG with only text or mixed image and text elements Only the text elements are extracted
Disabled SVG with raster elements only No text is extracted
Always SVG with only text or mixed image and text elements The text elements are extracted and the image elements are recognized and text extracted using the OCR engine
Always SVG with raster elements only The image elements are recognized and text extracted using the OCR engine

The engine will use DocumentPage.IsSvgSupported and DocumentPage.IsSvgConversionPreferred, as well as checking the SVG of the page elements (returned by DocumentPage.GetSvg) to perform the actions described above.

When Always is used, a valid (started) IOcrEngine instance must be set in DocumentText.OcrEngine.

When Auto is used, a valid (started) IOcrEngine instance should be set in DocumentText.OcrEngine. If this value is null, then the framework will behave as if Disabled were used.

Note: When using the OcrEngineType.LEAD engine, DocumentPage.GetText will try to optimize the speed of OCR recognition for text format output (for instance, it will not try to recognize font decorations such as bold or italic). This is done by checking if Recognition.AutoRecognizeManager.FormatSpeedOptimized is true (the default value). This optimization can result in DocumentPage.GetText producing slightly different recognition on complex input raster images than IOcrPage.GetText, which does not use the value of the setting. Therefore, if producing the same exact results from the two methods is important, set the value of the Recognition.AutoRecognizeManager.FormatSpeedOptimized setting to false in the IOcrEngine used with the document. Refer to LEADTOOLS OCR Module - LEAD Engine Settings for more information.

Requirements

Target Platforms

See Also

Reference

Leadtools.Document Namespace

Help Version 20.0.2020.4.3
Products | Support | Contact Us | Intellectual Property Notices
© 1991-2020 LEAD Technologies, Inc. All Rights Reserved.

Leadtools.Document Assembly