LEAD Technologies, Inc

LEADTOOLS OCR Advantage Engine Settings

The IOcrSettingManager.GetSettingNames method returns the names of the values as described in this table and in the same order.

The following table describes the settings supported by the LEADTOOLS OCR Advantage Engine:

Name Type Range and values Description
Recognition BeginCategory N/A Beginning of the recognition settings category
Recognition.Threading BeginCategory N/A Beginning of the recognition thread settings category
Recognition.Threading.MaximumThreads Integer 0 to Int32.MaxValue

Gets or sets the maximum number of threads to use in recognition. The LEADTOOLS Advantage OCR engine provides support for recognizing document zones in separate threads. This can improve the performance of the IOcrPage.Recognize method.

The default value of 0 (zero) instructs LEADTOOLS to use the maximum number of threads equal to the number of cores available in the current machine.

A value of 1, 2, 3, 4, etc. instructs LEADTOOLS to use that maximum number of threads equal to that number. If you do not wish to use multi-threading inside the IOcrPage.Recognize method then set the value of the Recognition.Threading.MaximumThreads to 1.

Note that if you set the value of this setting to something other than 1 (disable multi-threading) and the value of the Recognition.Threading.UseThreadPool is set to True, then the .NET Thread Pool will be used instead of regular Win32 threads, and hence, the actual number of worker threads will be a dynamic value determined by the Thread Pool depending on current system workload and number of free cores.

Recognition.Threading.UseThreadPool Boolean N/A

True to use the .NET ThreadPool, otherwise use Win32 threads. Note that the value of this property is used only if the value of Recognition.Threading.MaximumThreads is set to a value other than 1 (disable multi-threading).

End:Recognition.Threading EndCategory N/A End of the recognition thread settings category.
Recognition.PreProcessing BeginCategory N/A Beginning of the pre-processing settings category
Recognition.Preprocess.BlackWhiteImageConversionMethod Enum Default, Dynamic, User

This setting will influence how a non-B/W image, stored in the Engine, will be converted to a B/W one.

Default: This affects grayscale or 24-bit color images, a B/W image will be created in the Engine's memory. Image binarization applies an automatic adaptive thresholding algorithm.

Dynamic: This affects grayscale or 24-bit color images, a B/W image will be created in the Engine's memory. Each pixel is compared to a dynamically-calculated threshold, if the pixel intesity is higher it is set to white otherwise it is set to black.

User: This affects grayscale or 24-bit color images, a B/W image will be created in the Engine's memory. Thresholding with a user-defined threshold value, set by the Recognition.Preprocess.BlackWhiteImageConversionThreshold setting.

Recognition.Preprocess.BlackWhiteImageConversionThreshold Integer 0 to 255

The threshold to use when converting colored images to bitonal (black/white) in preparation to recognizing the text on the image. The conversion is done to separate the text intensities from the background intensities.

This is the equivalant of calling IntensityDetectCommand on the image with InColor equals to the detected foreground (text) color, OutColor equals to the detected background color, Channel to "Master", HighThreshold equals to 255, and LowThreshold equals to the value of this setting. Default value is 185.

End:Recognition.PreProcessing EndCategory N/A End of the pre-processing settings category.
Recognition.Zoning BeginCategory N/A Beginning of the zoning settings category.
Recognition.Zoning.DisableMultiThreading Boolean N/A

True to disable multi-threading when performing auto-zoning; otherwise multi-threading is enabled. Multi-threading enhances the performance of the auto-zoning algorithm. However, it may be undesirable if the OCR engine is hosted in a server.

Recognition.Zoning.CropZoneImage Boolean N/A

If this flag is set to true then the Advantage engine will crop each zone from the original image and recognize it. This can improve the performance of the IOcrPage.Recognize method.

Recognition.Zoning.Options Enum None, Detect Text, Detect Graphics, Detect Table, Allow Ovelap, Detect Accurate Zones, Use Text Extractor These flags affect the way the IOcrPage.AutoZone method works. Values can be OR-ed. Possible values are:
Value Meaning
None If this is the only flag that is set, the engine will use default values to perform auto zoning.
Detect Text Search for text zones inside the page image.
Detect Graphics Search for graphic zones inside the page image.
Detect Table Search for table zones inside the page image.
Allow Ovelap Allow zones to overlap; otherwise detected zones will not overlap.
Detect Accurate Zones Detect smaller and more accurate zones (like page paragraphs). Unless this flag is set the auto zone algorithm will try to detect the largest possible zones. Recognize One Cell Table Detect tables that only has one cell as tables. Must be OR'ed with "Detect Table" Table Cells as Zones Treat each cell detected inside a table as its own zone. If this option is set, the zone types will be OcrZoneType.Text instead of OcrZoneType.Table. Must be OR'ed with "Detect Table". Use Advnaced Table Detection Use advanced table detection for most accurate results when the document contains tables. This option will recursively and aggressively parse the document to locate the tables and cells position. Using this option will generate the most accurate representation of the original document and its tables in the final output. This option must be OR'ed with "Detect Table".
Use Text Extractor Improves text zone recognition, extracting text by separating graphics and tables from text areas.
Recognition.Zoning.EnableDoubleZoning Boolean N/A

If this flag is set to true then the Advantage engine will perform a second internal autozoning on each text zone to generate more homogenuous zones for recognition. This can improve the performance of the IOcrPage.Recognize method.

End:Recognition.Zoning EndCategory N/A End of the zoning settings category
Recognition.Words BeginCategory N/A Beginning of the word recognition settings category
Recognition.Words.DiscardLowConfidenceWords Boolean N/A This setting controls the output. If true, words/characters with a low rating (rubbish words/characters) will not be included when saving the recognition results to any of LEADTOOLS supported document formats.
Recognition.Words.LowWordConfidence Integer 0 to 100

Discard any word with a confidence value less than this value. This setting only takes effect when DiscardLowConfidenceWords is set to true.

End:Recognition.Words EndCategory N/A End of the words recognition settings category.
Recognition.Adaption BeginCategory N/A Beginning of the recognition adaption settings category.
Recognition.Adaption.AdaptedDataFilePath Boolean N/A

Enables/disables learning of the Advantage OCR engine. When true, all character features are saved to a file after each recognition. At the next recognition this file is loaded in order to use the saved adapted character features, resulting in better recognition results.

Notes:

1- When performing form recognition, it is recommended that you turn this feature off. This way you compare the form against the original results.

2- When performing form processing, it is recommended you turn this feature on. This way, since the forms are identical, you get better OCR results with subsequent forms.

End:Recognition.Adaption EndCategory N/A End of the recognition adaption settings category.
Recognition.CharacterFilter BeginCategory N/A Beginning of the recognition character filters category.
Recognition.CharacterFilter.MinimumPixelWidth Integer 0 to Int32.MaxValue

Minimum width of a recognized character in pixels.

Recognition.CharacterFilter.MinimumPixelHeight Integer 0 to Int32.MaxValue

Minimum height of a recognized character in pixels.

Recognition.CharacterFilter.MinimumPixelSizeExcludeCharacters String No maximum and can be null

Characters to exclude from the minimum pixel width and height rule.

Recognition.CharacterFilter.DiscardNoiseLikeCharacters Boolean N/A

Ignore recognized characters that have features similar to noise.

End:Recognition.CharacterFilter EndCategory N/A End of the recognition character filters category.
Recognition.Fonts BeginCategory N/A Beginning of the fonts category.
Recognition.Fonts.EnableCapsCaps Boolean N/A

Enable Caps/Caps font recognition enhancements.

Recognition.Fonts.DetectFontStyles Enum None, Bold, Italic, Underline, SansSerif, Serif, Proportional, Superscript, Subscript Enable or disable the detection of specific font properties. These flags affect the final generated document if the format supports fonts such as PDF or DOC. Values can be OR-ed. Possible values are:
Value Meaning
None Do not detect any font styles.
Bold Detect bold font style.
Italic Detect italic font style.
Underline Detect underline font style.
SansSerif Detect Sans-Serif font style (such as Times New Roman).
Serif Detect Serif font style (such as Arial).
Proportional Detect proportional font style (such as Times New Roman or Arial) or unproportional font style (such as Courier New).
Superscript Detect super-script font style.
Subscript Detect subscript font style.
Recognition.Fonts.RecognizeFontAttributes Boolean N/A

Enable font attributes recognition. Disabling it can improve the speed of the IOcrPage.Recognize method.

End:Recognition.Fonts EndCategory N/A End of the fonts category.
End:Recognition EndCategory N/A End of the recognition settings category.

 

 


Products | Support | Contact Us | Copyright Notices

© 2006-2012 All Rights Reserved. LEAD Technologies, Inc.