LEADTOOLS OCR Advantage Engine Settings

The IOcrSettingManager.GetSettingNames method returns the names of the values as described in this table and in the same order.

The following table describes the settings supported by the LEADTOOLS OCR Advantage Engine:

Name Type Range and values Description
Recognition BeginCategory N/A Beginning of the Recognition Settings category
Recognition.RecognitionModuleTradeoff Enum

Recognition module tradeoff between speed and accuracy. The default value is Balanced (1).

Possible values are:

Value Description
(0) Accurate Accuracy is more important than speed.
(1) Balanced Accuracy and speed are equally important.
(2) Fast Speed is more important than accuracy.
Recognition.ShareOriginalImage Boolean N/A

true to share the image reference used to create the OCR page; otherwise, false. The default value is false.

Setting this value to true affects the IOcrPageCollection.AddPage(RasterImage, OcrProgressCallback) and IOcrPageCollection.InsertPage(int, RasterImage, OcrProgressCallback) methods.

When the value is false (the default), these methods make a copy of the image and use the copy to create the page. Calling IOcrPage.GetRasterImage(OcrPageType.Original) on such a page returns a null reference.

When the value is true, these methods use the same image reference to create the page. Calling IOcrPage.GetRasterImage(OcrPageType.Original) on such a page returns the original image reference.

IOcrAutoRecognizeManager temporarily sets the value of this setting to true while performing a recognition job.

Recognition.ModifyProcessingImage Boolean N/A

true to modify the processing image after recognition; otherwise, false. The default value is true.

It is best to set the value of this setting to true if IOcrPage.Recognize is called only once per page.

IOcrAutoRecognizeManager temporarily sets the value of this setting to true while performing a recognition job.

Recognition.DetectColors Boolean N/A

Automatically detect the foreground and background colors of each character. The default value is true.

If this value is true, then the engine tries to automatically detect the colors of the zones when IOcrPage.AutoZone is called and sets the values in OcrZone.ForeColor and OcrZone.BackColor.

Recognition.AutoSecondPass Boolean N/A

Automatically perform second image processing cleanup on the internal black and white image if the first pass did not provide satisfactory results. The default value is true.

Recognition.MaximumPageConventionalMemorySize Integer 0 to Int32.MaxValue

The appropriate setting for Recognition.MaximumPageConventionalMemorySize depends on the system hardware configuration and the number of cores and application types being used. Change this setting if out-of-memory errors occurs when running your application.

The IOcrEngine supports loading RasterImage objects directly from disk files, streams or URLs, (as for example, the various methods in the IOcrPageCollection and IOcrAutoRecognizeManager classes do). The RasterImage loaded holds the original image and is useful only when saving graphics zones or image-over-text overlays. If the image is large, and one created using conventional memory, then a large amount of physical memory is used to hold this image in memory and is not available for other purposes such as auto-zoning or recognition. This is more noticeable in multi-threaded applications where loading several large images in conventional memory can cause out-of-memory errors, even when performing operations that normally would succeed.

Use "MaximumPageConventionalMemorySize" to set the maximum size of the image in memory allowed before the IOcrEngine automatically switches to use the disk memory feature of RasterImage (RasterMemoryFlags).

The "MaximumPageConventionalMemorySize" is in KBytes and its default value depends on the processor(s) being used. For x86 processors, the value is 42187 (42MBytes). For x64 processors, the value is calculated dynamically (1.7GBytes for each 8 cores, not exceeding the physical memory size). These values allow a typical OCR image (8.5 by 11 inches at 300 DPI and 32-bits per pixel) to be loaded in conventional memory. Anything significantly larger than that gets switched to use disk memory mode.

Different factors affect the performance of a particular setting and must be weighed. These include the following factors:

  • The speed of the machine's hard drive - increases the penalty for using disk memory rather than conventional memory
  • Load time - using disk memory consumes more time loading than using conventional memory
  • Autozoning and Recognition - using disk memory improves the performance of autozoning, recognition and other operations because conventional memory is freed for image processing.
Recognition.Threading BeginCategory N/A Beginning of the Recognition Thread Settings category
Recognition.Threading.MaximumThreads Integer 0 to Int32.MaxValue

Gets or sets the maximum number of threads to use in recognition. The LEADTOOLS Advantage OCR engine provides support for recognizing document zones in separate threads. This can improve the performance of the IOcrPage.Recognize method.

The default value is 0, which instructs LEADTOOLS to use the system thread pool. The number of threads is calculated automatically. Setting the value to a number > 1 also turns multi-threading on, but the number of threads specified is ignored and calculated automatically instead.

To turn multi-threading off inside the IOcrPage.Recognize method and use a single thread, set the value to 1.

End:Recognition.Threading EndCategory N/A End of the Recognition Thread Settings category.
Recognition.PreProcessing BeginCategory N/A Beginning of the Pre-processing Settings category
Recognition.Preprocess.BlackWhiteImageConversionMethod Enum

This setting influences how a non-black and white image, stored in the Engine, is converted to a black and white one. The default value is Default (0).

Possible values are:

Value Description
(0) Default

This affects grayscale or 24-bit color images. A black and white image is created in the Engine's memory. Image binarization applies an automatic adaptive thresholding algorithm.

(1) Dynamic

This affects grayscale or 24-bit color images. A black and white image is created in the Engine's memory. Each pixel is compared to a dynamically-calculated threshold. If the pixel intensity is higher it is set to white; otherwise, it is set to black.

(2) User

This affects grayscale or 24-bit color images. A black and white image is created in the Engine's memory. Thresholding with a user-defined threshold value is performed. Set the threshold with Recognition.Preprocess.BlackWhiteImageConversionThreshold).

Recognition.Preprocess.BlackWhiteImageConversionThreshold Integer 0 to 255

The threshold to use when converting colored images to bitonal (black and white) in preparation for recognizing the text on the image. Conversion separates the text intensities from the background intensities. The default value is 185.

This is the equivalent of calling IntensityDetectCommand on the image with InColor equal to the detected foreground (text) color, OutColor equal to the detected background color, Channel set to Master, HighThreshold equal to 255, and LowThreshold equal to the value of this setting.

Recognition.Preprocess.MobileImagePreprocess Boolean N/A

true to enable mobile image processing mode; otherwise, false. The default value is false.

By design, the OCR engine tries to upscale low resolution (DPI) images. However, most cameras in mobile devices take pictures with a low resolution (for example, 72 DPI) but a large pixel size. If the OCR engine upscales such images, a lot more memory than necessary is consumed. If you are using the OCR engine to process images from a mobile camera, set the mobile image processing mode to true.

Recognition.Preprocess.DownSampleLargeImage Boolean N/A

true to downsample large images prior to recognition; otherwise, false. The default value is false.

Set the value to true to prevent the OCR engine from creating processing images (the image used for recognition) larger than 4000 by 4000 pixels (in order to preserve memory and resources). This value is ignored if the value of the MobileImagePreprocess setting is true.

Recognition.Preprocess.UseZoningEngine Boolean N/A

true to use the zoning engine to exclude graphics areas from preprocessing calculations such as deskew and auto-rotate; otherwise, false. The default value is true.

Recognition.Preprocess.MinimumAutoRotateConfidence Integer 0 to 100

Used by the IOcrPage.AutoPreprocess method to determine the minimum confidence percentage threshold to use when orienting pages. The default value is 26.

Recognition.Preprocess.ModifyOriginalImageOptions Enum (Flags)

Specifies how the original image is modified when calling the IOcrPage.AutoPreprocess method. The default value is Deskew | Rotate | Invert (0x01 | 0x02 | 0x04 = 0x07).

Values can be OR'd together. Possible values are:

Value Description
(0) None Never modify the original image.
(0x01) Deskew Apply any angle found while deskewing (IOcrPage.GetDeskewAngle) the original image.
(0x02) Rotate Apply the angle (always a right angle) found while performing auto-orientation (IOcrPage.GetRotateAngle) on the original image (auto-orient).
(0x04) Invert Apply the inversion value (IOcrPage.IsInverted) on the original image.

These options are useful when saving a document that has image over text (such as PDF documents). When saving a document that has image over text it may be preferable to overlay the original image without making any modifications that could affect the size. The only modification that should be allowed in such a case is "Rotate". The IOcrAutoRecognizeManager automatically sets the value to "Rotate" if the final document format supports image over text.

Recognition.Preprocess.RemoveInvertedTextRegionsFromProcessImage Boolean N/A

true to automatically detect and recognize inverted regions, otherwise; false. The default value is false.

Set this value to true to support OCRing an image that contains both black/white and white/black areas.

End:Recognition.PreProcessing EndCategory N/A End of the Pre-processing Settings category.
Recognition.Zoning BeginCategory N/A Beginning of the Zoning Settings category.
Recognition.Zoning.DisableMultiThreading Boolean N/A

true to disable multi-threading when performing auto-zoning; otherwise, multi-threading is enabled. The default value is false.

Multi-threading enhances the performance of the auto-zoning algorithm. However, it can be undesirable if the OCR engine is hosted on a server.

Recognition.Zoning.CropZoneImage Boolean N/A

true to crop each zone from the original image and recognize it by itself; otherwise, false. The default value is true.

Setting this value to true can improve the performance of the IOcrPage.Recognize method.

Recognition.Zoning.DetectZoneRotationAngle Boolean N/A

true to try to detect a separate rotation angle for each zone; otherwise, false. The default value is false.

Setting this value to false can increase the speed of the recognition engine.

Recognition.Zoning.DetectVerticalZones Enum

Vertical zone detection mode. This works with Latin and Asian languages.

Possible values are:

Value Description
(0) Auto Automatic - currently this means on for Asian languages such as Japanese as off for Latin languages such as English.
(1) On On - currently this means on for both Asian languages such as Japanese and Latin languages such as English.
(2) Off Off - currently this means off for both Asian languages such as Japanese and Latin languages such as English.
Recognition.Zoning.Options Enum (Flags)

These flags affect the way the IOcrPage.AutoZone method works. The default value is Detect Text | Detect Graphics | Detect Table | Detect Accurate Zones | Table Cells As Zones | Use Advanced Table Detection | Use Text Extractor | FavorGraphics (0x01 | 0x02 | 0x04 | 0x10 | 0x40 | 0x80 | 0x100 | 0x400).

Values can be OR'ed together. Possible values are:

Value Description
(0) None No options. The engine will not detect any zones and hence no recognition is performed.
(0x01) Detect Text Search for text zones inside the page image.
(0x02) Detect Graphics Search for graphic zones inside the page image.
(0x04) Detect Table Search for table zones inside the page image.
(0x08) Allow Overlap Allow zones to overlap; otherwise, detected zones will not overlap.
(0x10) Detect Accurate Zones Detect smaller and more accurate zones (like page paragraphs). Unless this flag is set the auto-zone algorithm tries to detect the largest possible zones.
(0x20) Recognize One Cell Table Even if a table has only one cell, detect it as a table. Must be OR'ed with Detect Table
(0x40) Table Cells as Zones Treat each cell detected inside a table as its own zone. If this option is set, the zone types are OcrZoneType.Text instead of OcrZoneType.Table. Must be OR'ed with Detect Table.
(0x80) Use Advanced Table Detection Use advanced table detection to obtain the most accurate results when the document contains tables. This option recursively and aggressively parses the document to locate the positions of tables and cells. Using this option generates the most accurate representation of the original document and its tables in the final output. This option must be OR'ed with Detect Table.
(0x100) Use Text Extractor Improves text zone recognition. Extracts text by separating graphics and tables from text areas.
(0x200) Detect Checkbox Search for checkbox zones inside the page image.
(0x400) Favor Graphics Favor converting a blob with very low accuracy into a contiguous graphics instead of text. This option is on by default and results in better visual representation of the output documents. The OCR engine will set this option off when performing Auto Recognize with Text or PDF with Image over Text options set. In this mode, the engine assumes that the result should contain all text parsed regardless of quality.
Recognition.Zoning.EnableDoubleZoning Boolean N/A

true to perform a second, internal autozoning procedure on each text zone in order to generate more homogenous zones for recognition; otherwise, false. The default value is true.

Setting this value can improve the performance of the IOcrPage.Recognize method.

End:Recognition.Zoning EndCategory N/A End of the Zoning Settings category
Recognition.Words BeginCategory N/A Beginning of the Word Recognition Settings category
Recognition.Words.DiscardLowConfidenceWords Boolean N/A

true to discard words with a low rating; otherwise, false. The default value is true.

This setting controls the output. If true, the engine checks the confidence of each word and prevents any having a low rating (lower than the LowWordConfidence value) from being included when saving the recognition results to any of the document formats supported by the LEADTOOLS toolkits.

Recognition.Words.DiscardLowConfidenceZones Boolean N/A

true to discard zones with low ratings; otherwise, false. The default value is false.

This setting controls the output. If true, the engine checks all of the words/characters in a zone. If it determines that the overall confidence and type of characters constitute noise, then the recognition results for the entire zone is discarded.

Recognition.Words.LowWordConfidence Integer 0 to 100

Discard any word with a confidence value less than this value. the default value is 50.

This setting only takes effect when DiscardLowConfidenceWords is set to true.

End:Recognition.Words EndCategory N/A End of the Words Recognition Settings category.
Recognition.Adaption BeginCategory N/A Beginning of the Recognition Adaption Settings category.
Recognition.Adaption.AdaptedDataFilePath Boolean N/A

Not used in this version of LEADTOOLS

End:Recognition.Adaption EndCategory N/A End of the Recognition Adaption Settings category.
Recognition.CharacterFilter BeginCategory N/A Beginning of the Recognition Character Filters category.
Recognition.CharacterFilter.MinimumPixelWidth Integer 0 to Int32.MaxValue

The minimum width of a recognized character, in pixels. The default value is 6.

Recognition.CharacterFilter.MinimumPixelHeight Integer 0 to Int32.MaxValue

The minimum height of a recognized character, in pixels. The default value is 6.

Recognition.CharacterFilter.MinimumPixelSizeExcludeCharacters String No maximum. Can be null

Characters to exclude from the minimum pixel width and height rule. The default value is ".".

Recognition.CharacterFilter.DiscardNoiseLikeCharacters Boolean N/A

true to ignore recognized characters that have features similar to noise; otherwise, false. The default value is false.

Recognition.CharacterFilter.DiscardNoisyZones Boolean N/A

true to discard all the results in the zone if the engine determines that all the characters recognized are noise; otherwise, false. The default value is false.

Recognition.CharacterFilter.PostprocessMICR Boolean N/A

true to post-process any MICR zones by discarding all of the characters, numbers and symbols that do not belong to the MICR character set as well as performing basic validity data checking; otherwise, false. The default value is true.

End:Recognition.CharacterFilter EndCategory N/A End of the Recognition Character Filters category.
Recognition.Fonts BeginCategory N/A Beginning of the Fonts category.
Recognition.Fonts.EnableCapsCaps Boolean N/A

true to enable Caps/Caps (CamelCase) font recognition enhancements; otherwise, false. The default value is false.

Recognition.Fonts.DetectFontStyles Enum (Flags)

Enable or disable the detection of specific font properties. These flags affect the final generated document if the format supports fonts (such as the PDF or DOCX formats). The default value is Bold | Italic | Underline | SansSerif | Serif | Proportional | Superscript | Subscript | Strikeout (0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80 | 0x100 = 1FF).

Values can be OR-ed together. Possible values are:

Value Description
(0) None Do not detect any font styles.
(0x01) Bold Detect bold font styles.
(0x02) Italic Detect italic font styles.
(0x04) Underline Detect underline font styles.
(0x08) SansSerif Detect Sans-Serif font styles (for example, Arial).
(0x10) Serif Detect Serif font styles (for example, Times New Roman).
(0x20) Proportional Detect proportional font styles (for example, Times New Roman or Arial) or fixed space font styles (for example, Courier New).
(0x40) Superscript Detect super-script font styles.
(0x80) Subscript Detect subscript font styles.
(0x100) Strikeout Detect strikeout font styles.
Recognition.Fonts.RecognizeFontAttributes Boolean N/A

true to enable font attributes recognition; otherwise, false. The default value is true.

Setting this value to false can improve the speed of the IOcrPage.Recognize method if the font attributes of the recognized characters is not required, for instance, if recognition is used to obtain raw text only and not to create a formatted output document.

End:Recognition.Fonts EndCategory N/A End of the Fonts category.
Recognition.AutoRecognizeManager BeginCategory N/A Beginning of the Auto-recognize Manager category.
Recognition.AutoRecognizeManager.FormatSpeedOptimized Boolean N/A

true to optimize the recognition speed based on the format of the final document; otherwise, false. The default value is true.

As an example, the OCR engine does not recognize font attributes such as italic or bold if the final document format is Text.

Recognition.AutoRecognizeManager.DefaultDocumentOrientation Enum

The default orientation for the generated document if a page is blank or entirely graphics. The default value is None.

Possible values are:

Value Description
(0) None Do not change the orientation.
(1) Portrait Try to change to portrait orientation (make the width less than height) if the page is blank or entirely graphics.
(2) Landscape Try to change to landscape orientation (make the width greater than height) if the page is blank or entirely graphics.
End:Recognition.AutoRecognizeManager EndCategory N/A End of the Auto-recognize Manager category.
End:Recognition EndCategory N/A End of the Recognition Settings category.
SpellChecker BeginCategory N/A Beginning of the Spell Checker category.
SpellChecker.MaximumDictionaries Integer 0 to 255

Gets or sets the maximum number of spell checkers to use at the same time. The default value is the number of available dictionaries found in the system.

SpellChecker.EnableCache Boolean N/A

true to enable caching of frequent words, otherwise; false. The default value is true.

End:SpellChecker EndCategory N/A End of the Spell Checker category.

Recognition.RecognitionModuleTradeoff

Value Description
(0) Accurate Accuracy is more important than speed.
(1) Balanced Accuracy and speed are equally important.
(2) Fast Speed is more important than accuracy.

Recognition.Preprocess.BlackWhiteImageConversionMethod

Value Description
(0) Default

This affects grayscale or 24-bit color images. A black and white image is created in the Engine's memory. Image binarization applies an automatic adaptive thresholding algorithm.

(1) Dynamic

This affects grayscale or 24-bit color images. A black and white image is created in the Engine's memory. Each pixel is compared to a dynamically-calculated threshold. If the pixel intensity is higher it is set to white; otherwise, it is set to black.

(2) User

This affects grayscale or 24-bit color images. A black and white image is created in the Engine's memory. Thresholding with a user-defined threshold value is performed. Set the threshold with Recognition.Preprocess.BlackWhiteImageConversionThreshold).

Recognition.Preprocess.ModifyOriginalImageOptions

Value Description
(0) None Never modify the original image.
(0x01) Deskew Apply any angle found while deskewing (IOcrPage.GetDeskewAngle) the original image.
(0x02) Rotate Apply the angle (always a right angle) found while performing auto-orientation (IOcrPage.GetRotateAngle) on the original image (auto-orient).
(0x04) Invert Apply the inversion value (IOcrPage.IsInverted) on the original image.

Recognition.Zoning.DetectVerticalZones

Value Description
(0) Auto Automatic - currently this means on for Asian languages such as Japanese as off for Latin languages such as English.
(1) On On - currently this means on for both Asian languages such as Japanese and Latin languages such as English.
(2) Off Off - currently this means off for both Asian languages such as Japanese and Latin languages such as English.

Recognition.Zoning.Options

Value Description
(0) None No options. The engine will not detect any zones and hence no recognition is performed.
(0x01) Detect Text Search for text zones inside the page image.
(0x02) Detect Graphics Search for graphic zones inside the page image.
(0x04) Detect Table Search for table zones inside the page image.
(0x08) Allow Overlap Allow zones to overlap; otherwise, detected zones will not overlap.
(0x10) Detect Accurate Zones Detect smaller and more accurate zones (like page paragraphs). Unless this flag is set the auto-zone algorithm tries to detect the largest possible zones.
(0x20) Recognize One Cell Table Even if a table has only one cell, detect it as a table. Must be OR'ed with Detect Table
(0x40) Table Cells as Zones Treat each cell detected inside a table as its own zone. If this option is set, the zone types are OcrZoneType.Text instead of OcrZoneType.Table. Must be OR'ed with Detect Table.
(0x80) Use Advanced Table Detection Use advanced table detection to obtain the most accurate results when the document contains tables. This option recursively and aggressively parses the document to locate the positions of tables and cells. Using this option generates the most accurate representation of the original document and its tables in the final output. This option must be OR'ed with Detect Table.
(0x100) Use Text Extractor Improves text zone recognition. Extracts text by separating graphics and tables from text areas.
(0x200) Detect Checkbox Search for checkbox zones inside the page image.
(0x400) Favor Graphics Favor converting a blob with very low accuracy into a contiguous graphics instead of text. This option is on by default and results in better visual representation of the output documents. The OCR engine will set this option off when performing Auto Recognize with Text or PDF with Image over Text options set. In this mode, the engine assumes that the result should contain all text parsed regardless of quality.

Recognition.Fonts.DetectFontStyles

Value Description
(0) None Do not detect any font styles.
(0x01) Bold Detect bold font styles.
(0x02) Italic Detect italic font styles.
(0x04) Underline Detect underline font styles.
(0x08) SansSerif Detect Sans-Serif font styles (for example, Arial).
(0x10) Serif Detect Serif font styles (for example, Times New Roman).
(0x20) Proportional Detect proportional font styles (for example, Times New Roman or Arial) or fixed space font styles (for example, Courier New).
(0x40) Superscript Detect super-script font styles.
(0x80) Subscript Detect subscript font styles.
(0x100) Strikeout Detect strikeout font styles.

Recognition.AutoRecognizeManager.DefaultDocumentOrientation

Value Description
(0) None Do not change the orientation.
(1) Portrait Try to change to portrait orientation (make the width less than height) if the page is blank or entirely graphics.
(2) Landscape Try to change to landscape orientation (make the width greater than height) if the page is blank or entirely graphics.
Help Version 19.0.2017.10.27
Products | Support | Contact Us | Copyright Notices
© 1991-2017 LEAD Technologies, Inc. All Rights Reserved.
LEADTOOLS Imaging, Medical, and Document