DocumentTextImagesRecognitionMode Enumeration

Summary

Indicates how to treat the image elements encountered in the input SVG page during text extraction.

Syntax

C++/CLI

Python

[SerializableAttribute()] 
[DataContractAttribute()] 
public enum DocumentTextImagesRecognitionMode

public: 
   [SerializableAttribute,  
   DataContractAttribute] 
   enum class DocumentTextImagesRecognitionMode sealed

class DocumentTextImagesRecognitionMode(Enum): 
   Auto = 0 
   Disabled = 1 
   Always = 2

Members

Value	Member	Description
0	Auto	Use SVG engine unless the page is all raster.
1	Disabled	Do not use OCR recognition for the image elements. Instead, ignore the image elements.
2	Always	Use OCR recognition on the image elements. Add the recognition data to the final document page text with the rest of the other SVG elements of the page. Requires a valid IOcrEngine instance.

Remarks

Use DocumentTextImagesRecognitionMode to specify which DocumentText.ImagesRecognitionMode type to determine how image elements are treated during text extraction from an SVG page. This value has no effect on raster pages, and OCR is always used.

SVG elements can also contain glyph (paths) that may or may not be considered images and could also be recognized using the OCR engine. This is controlled by the DocumentText.RecognizeGlyphs property.

The following table helps determine what would occur during DocumentPage.GetText, depending on the type of the page:

Value	Page Type	Behavior
Auto	SVG with only text or mixed image and text elements	Only the text elements are extracted
Auto	SVG with raster elements only	The image elements are recognized and text extracted using the OCR engine
Disabled	SVG with only text or mixed image and text elements	Only the text elements are extracted
Disabled	SVG with raster elements only	No text is extracted
Always	SVG with only text or mixed image and text elements	The text elements are extracted and the image elements are recognized and text extracted using the OCR engine
Always	SVG with raster elements only	The image elements are recognized and text extracted using the OCR engine

The engine will use DocumentPage.IsSvgSupported and DocumentPage.IsSvgConversionPreferred, as well as checking the SVG of the page elements (returned by DocumentPage.GetSvg) to perform the actions described above.

When Always is used, a valid (started) IOcrEngine instance must be set in DocumentText.OcrEngine.

When Auto is used, a valid (started) IOcrEngine instance should be set in DocumentText.OcrEngine. If this value is null, then the framework will behave as if Disabled were used.

Note: When using the OcrEngineType.LEAD engine, DocumentPage.GetText will try to optimize the speed of OCR recognition for text format output (for instance, it will not try to recognize font decorations such as bold or italic). This is done by checking if Recognition.AutoRecognizeManager.FormatSpeedOptimized is true (the default value). This optimization can result in DocumentPage.GetText producing slightly different recognition on complex input raster images than IOcrPage.GetText, which does not use the value of the setting. Therefore, if producing the same exact results from the two methods is important, set the value of the Recognition.AutoRecognizeManager.FormatSpeedOptimized setting to false in the IOcrEngine used with the document. Refer to LEADTOOLS OCR Module - LEAD Engine Settings for more information.

Requirements

Target Platforms

Reference

Leadtools.Document Namespace

Download our FREE evaluation

Help Version 22.0.2023.4.21

Leadtools.Document Assembly

Introduction

Getting Started

Namespaces

Leadtools.Document Namespace

Assemblies