LEADTOOLS OCR Advantage Engine Settings

The following table describes the settings supported by the LEADTOOLS OCR Advantage Engine:

Name Type Range and values Description
Recognition BeginCategory N/A Beginning of the recognition settings category
Recognition.RecognitionModuleTradeoff Enum Accurate, Balanced, Fast Recognition module tradeoff between speed and accuracy. Default value is Balanced
Recognition.ModifyProcessingImage Boolean N/A True to modify the processing image after recognition; otherwise, False.
It is best to set the value of this setting to True if L_OcrPage_Recognize is called only once per page.
L_OcrAutoRecognizeManager will temporarily set the value of this setting to True while performing a recognition job.
Recognition.DetectColors Boolean N/A Automatically detect the foreground and background colors of each character. Default value is False.
If this value is True, then the engine will try to automatically detect the colors of the zones when L_OcrPage_AutoZone is called and set the values in ForeColor and BackColor members of L_OcrZone structure.
Recognition.AutoSecondPass Boolean N/A Automatically perform second image processing clean up on the internal B/W image if the first pass did not provide satisfactory results. Default value is True.
Recognition.MaximumPageConventionalMemorySize Integer 0 to 2147483647 L_OcrAutoRecognizeManager has support for loading bitmap handle objects directly from disk files. The loaded bitmap handle holds the original image that will only be useful when saving graphics zones or image over text overlays. If this image was of a large size and was created using conventional memory, then the process will use a large amount of its physical memory holding this image and not using it for other purposes such as auto-zoning or recognizing. This is more noticeable in multi-threaded applications were loading several large images in the conventional memory will cause out of memory errors on operations that should normally succeed.
L_OcrEngine can automatically switch to use the disk memory feature of BITMAPHANDLE if the size of the image in memory is to exceed the predetermined value set in "MaximumPageConventionalMemorySize".
"MaximumPageConventionalMemorySize" is in KBytes and the default value is 42984 (42MBytes) for x86 and 429840 (420MBytes) for x64. This value allow a typical OCR image of 8.5 by 11 inches at 300 DPI and 32-bits per pixel to be in conventional memory, but anything significantly larger than that to use disk memory mode.
Naturally using disk-memory is slower than using conventional memory. The exact ratio depends on the speed of the machine hard drive. Also, using disk-memory might end up speeding up the overall process since freeing the physical memory increases the performance of other operations such as auto-zone and recognize and the load operation that will certainly be slower might not take a large chunk of the overall time.
The exact value to set depends on the system hardware configuration, number of cores and application types. You should experiment with changing this value if you get out of memory errors in your application.
Recognition.Threading BeginCategory N/A Beginning of the recognition thread settings category
Recognition.Threading.MaximumThreads Integer 0 to 2147483647 Gets or sets the maximum number of threads to use in recognition. The LEADTOOLS Advantage OCR engine provides support for recognizing document zones in separate threads. This can improve the performance of the L_OcrPage_Recognize method.
The default value of 0 (zero) instructs LEADTOOLS to use the system thread pool.
If you do not wish to use multi-threading inside the L_OcrPage_Recognize method then set the value of the Recognition.Threading.MaximumThreads to 1. Any other value is treated as 0 (use the thread pool).
End:Recognition.Threading EndCategory N/A End of the recognition thread settings category.
Recognition.PreProcessing BeginCategory N/A Beginning of the pre-processing settings category
Recognition.Preprocess.BlackWhiteImageConversionMethod Enum Default, Dynamic, User This setting will influence how a non-B/W image, stored in the Engine, will be converted to a B/W one.
Default: This affects grayscale or 24-bit color images, a B/W image will be created in the Engine's memory. Image binarization applies an automatic adaptive thresholding algorithm.
Dynamic: This affects grayscale or 24-bit color images, a B/W image will be created in the Engine's memory. Each pixel is compared to a dynamically-calculated threshold, if the pixel intensity is higher it is set to white otherwise it is set to black.
User: This affects grayscale or 24-bit color images, a B/W image will be created in the Engine's memory. Thresholding with a user-defined threshold value, set by the Recognition.Preprocess.BlackWhiteImageConversionThreshold setting.

Recognition.Preprocess.BlackWhiteImageConversionThreshold

Integer

0 to 255

The threshold to use when converting colored images to bitonal (black/white) in preparation to recognizing the text on the image. The conversion is done to separate the text intensities from the background intensities.

This is the equivalent of calling L_IntensityDetectBitmap on the image with crInColor equals to the detected foreground (text) color, crOutColor equals to the detected background color, uChannel to IDB_CHANNEL_MASTER, uHigh equals to 255, and uLow equals to the value of this setting. Default value is 185.

Recognition.Preprocess.MobileImagePreprocess

Boolean

N/A

True to enable mobile image processing mode; otherwise, false. By default, the OCR engine will try to upscale images with a low resolution (DPI). However, in most mobile devices, the camera will take a picture with a low resolution (for example, 72 DPI) and a large size in pixels. Therefore, having the OCR engine upscale the images will result in undesired consumption of memory. If you are using the OCR engine to process images from a mobile camera, set the value of this setting to false.

Recognition.Preprocess.DownSampleLargeImage

Boolean

N/A

True to down sample large images prior to recognition; otherwise, false. Set the value of this setting to true to force the OCR engine to not create processing images (the image used for recognition) larger than 4000 by 4000 pixels to preserve memory and resources. This value is ignored if the value of the MobileImagePreprocess setting is true.

Recognition.Preprocess.UseZoningEngine

Boolean

N/A

True to use the zoning engine to exclude graphics areas from preprocessing calculations such as deskew and auto-rotate. Otherwise; false.

Recognition.Preprocess.MinimumAutoRotateConfidence

Integer

0 to 100

Used by L_OcrPage_AutoPreprocess to determine the minimum confidence percentage threshold to use when orienting pages. Default value is 26.

Recognition.Preprocess.ModifyOriginalImageOptions

Enum

None, Deskew, Rotate, Invert

Specifies how the original image is modified when a IOcrPage.AutoPreprocess. Default value is Deskew | Rotate | Invert.

Value

Description

None

Never modify the original image

Deskew

Apply any angle found while deskewing (L_OcrPage_GetDeskewAngle) on the original image

Rotate

Apply the angle (always a right angle) found while performing auto-orient (L_OcrPage_GetRotateAngle) on the original image (auto-orient)

Invert

Apply the inversion value (L_OcrPage_IsInverted) on the original image

These options are useful when saving a document with image over text option (such as the one supported by PDF). In this scenario, it maybe be preferable to overlay the original image without any modification that might affect the size. The only option that should be left in this case is Rotate. Leadtools.Forms.Ocr.IOcrAutoRecognizeManager will automatically set the value of this setting to "Rotate" if the final document format has image over text support.

End:Recognition.PreProcessing

EndCategory

N/A

End of the pre-processing settings category.

Recognition.Zoning

BeginCategory

N/A

Beginning of the zoning settings category.

Recognition.Zoning.DisableMultiThreading

Boolean

N/A

True to disable multi-threading when performing auto-zoning; otherwise multi-threading is enabled. Multi-threading enhances the performance of the auto-zoning algorithm. However, it may be undesirable if the OCR engine is hosted in a server.

Recognition.Zoning.CropZoneImage

Boolean

N/A

If this flag is set to true then the Advantage engine will crop each zone from the original image and recognize it. This can improve the performance of the L_OcrPage_Recognize method.

Recognition.Zoning.DetectZoneRotationAngle

Boolean

N/A

If this value is set to True, then the engine will try to detect a separate rotation angle for each zone. Default value is False.

Recognition.Zoning.Options

Enum

None, Detect Text, Detect Graphics, Detect Table, Allow Overlap, Detect Accurate Zones, Use Text Extractor, Detect Checkbox

These flags affect the way the IOcrPage.AutoZone method works. Values can be OR-ed. Possible values are:

Value

Meaning

None

If this is the only flag that is set, the engine will use default values to perform auto zoning.

Detect Text

Search for text zones inside the page image.

Detect Graphics

Search for graphic zones inside the page image.

Detect Table

Search for table zones inside the page image.

Allow Overlap

Allow zones to overlap; otherwise detected zones will not overlap.

Detect Accurate Zones

Detect smaller and more accurate zones (like page paragraphs). Unless this flag is set the auto zone algorithm will try to detect the largest possible zones.

Recognize One Cell Table

Detect tables that only has one cell as tables. Must be OR'ed with "Detect Table"

Table Cells as Zones

Treat each cell detected inside a table as its own zone. If this option is set, the zone types will be L_OcrZoneType_Text instead of L_OcrZoneType_Table. Must be OR'ed with "Detect Table".

Use Advanced Table Detection

Use advanced table detection for most accurate results when the document contains tables. This option will recursively and aggressively parse the document to locate the tables and cells position. Using this option will generate the most accurate representation of the original document and its tables in the final output. This option must be OR'ed with "Detect Table".

Use Text Extractor

Improves text zone recognition, extracting text by separating graphics and tables from text areas.

Detect Checkbox

Search for checkbox zones inside the page image.

Recognition.Zoning.EnableDoubleZoning

Boolean

N/A

If this flag is set to true then the Advantage engine will perform a second internal autozoning on each text zone to generate more homogenous zones for recognition. This can improve the performance of the L_OcrPage_Recognize method.

End:Recognition.Zoning

EndCategory

N/A

End of the zoning settings category

Recognition.Words

BeginCategory

N/A

Beginning of the word recognition settings category

Recognition.Words.DiscardLowConfidenceWords

Boolean

N/A

This setting controls the output. If True, words/characters with a low rating (rubbish words/characters) will not be included when saving the recognition results to any of LEADTOOLS supported document formats.

Recognition.Words.DiscardLowConfidenceZones

Boolean

N/A

This setting controls the output. If True, the engine will check all the words/characters in a zone. If it determine that the over all confidence and type of characters constitute noise, then the whole zone recognition results will be discarded. Default value is False.

Recognition.Words.LowWordConfidence

Integer

0 to 100

Discard any word with a confidence value less than this value. This setting only takes effect when DiscardLowConfidenceWords is set to true.

End:Recognition.Words

EndCategory

N/A

End of the words recognition settings category.

Recognition.Adaption

BeginCategory

N/A

Beginning of the recognition adaption settings category.

Recognition.Adaption.AdaptedDataFilePath

Boolean

N/A

Not used in this version of LEADTOOLS

End:Recognition.Adaption

EndCategory

N/A

End of the recognition adaption settings category.

Recognition.CharacterFilter

BeginCategory

N/A

Beginning of the recognition character filters category.

Recognition.CharacterFilter.MinimumPixelWidth

Integer

0 to 2147483647

Minimum width of a recognized character in pixels.

Recognition.CharacterFilter.MinimumPixelHeight

Integer

0 to 2147483647

Minimum height of a recognized character in pixels.

Recognition.CharacterFilter.MinimumPixelSizeExcludeCharacters

String

No maximum and can be null

Characters to exclude from the minimum pixel width and height rule.

Recognition.CharacterFilter.DiscardNoiseLikeCharacters

Boolean

N/A

Ignore recognized characters that have features similar to noise.

Recognition.CharacterFilter.PostprocessMICR

Boolean

N/A

If the value of this setting is True, then the engine will post process any MICR zones by discarding all the characters, numbers and symbols that do not belong to the MICR character set as well as performing basic checking on the validity of the data. Default value is True

End:Recognition.CharacterFilter

EndCategory

N/A

End of the recognition character filters category.

Recognition.Fonts

BeginCategory

N/A

Beginning of the fonts category.

Recognition.Fonts.EnableCapsCaps

Boolean

N/A

Enable Caps/Caps font recognition enhancements.

Recognition.Fonts.DetectFontStyles

Enum

None, Bold, Italic, Underline, SansSerif, Serif, Proportional, Superscript, Subscript, Strikeout

Enable or disable the detection of specific font properties. These flags affect the final generated document if the format supports fonts such as PDF or DOC. Values can be OR-ed. Possible values are:

Value

Meaning

None

Do not detect any font styles.

Bold

Detect bold font styles.

Italic

Detect italic font styles.

Underline

Detect underline font styles.

SansSerif

Detect Sans-Serif font styles (such as Arial).

Serif

Detect Serif font styles (such as Times New Roman).

Proportional

Detect proportional font styles (such as Times New Roman or Arial) or fixed space font styles (such as Courier New).

Superscript

Detect super-script font styles.

Subscript

Detect subscript font styles.

Strikeout

Detect strikeout font styles.

Recognition.Fonts.RecognizeFontAttributes

Boolean

N/A

Enable font attributes recognition. Disabling it can improve the speed of the L_OcrPage_Recognize method.

End:Recognition.Fonts

EndCategory

N/A

End of the fonts category.

Recognition.AutoRecognizeManager

BeginCategory

N/A

Beginning of the auto-recognize manager category.

Recognition.AutoRecognizeManager.FormatSpeedOptimized

Boolean

N/A

Enable optimizing the recognition speed based on the final document format. For example, the OCR engine will not recognize font attributes such italic or bold if the final format is Text.

Recognition.AutoRecognizeManager.DefaultDocumentOrientation

Enum

None, Portrait, Landscape

Default orientation for the generated document if a page is blank or graphics only. Possible values are:

Value

Meaning

None

Do change the orientation.

Portrait

Try to change into portrait (make the width less than height) if the page is empty or contains graphics only.

Landscape

Try to change into landscape (make the width greater than height) if the page is empty or contains graphics only.

End:Recognition.AutoRecognizeManager

EndCategory

N/A

End of the auto-recognize manager category.

End:Recognition

EndCategory

N/A

End of the recognition settings category.

SpellChecker

BeginCategory

N/A

Beginning of the spell checker category.

SpellChecker.MaximumDictionaries

Integer

0 to 255

Gets or sets the maximum number of spell checkers to use at the same time. Default value is number of available dictionaries found in the system

End:SpellChecker

EndCategory

N/A

End of the spell checker category.

Help Version 19.0.2017.10.27
Products | Support | Contact Us | Copyright Notices
© 1991-2017 LEAD Technologies, Inc. All Rights Reserved.
LEADTOOLS Advantage OCR C API Help