LEADTOOLS OCR Advantage Engine Settings

Show in webframe

The IOcrSettingManager.GetSettingNames method returns the names of the values as described in this table and in the same order.

The following table describes the settings supported by the LEADTOOLS OCR Advantage Engine:

Name Type Range and values Description
Recognition BeginCategory N/A Beginning of the recognition settings category
Recognition.ShareOriginalImage Boolean N/A

True to share the image reference used to create the OCR page, otherwise; False. The default value is False.

Setting this value to true will affect the IOcrPageCollection.AddPage(RasterImage, OcrProgressCallback) and IOcrPageCollection.InsertPage(int, RasterImage, OcrProgressCallback) methods.

When the value is false (the default), these methods will make a copy of the image and use the copy to create the page. Calling IOcrPage.GetRasterImage(OcrPageType.Original) on this page will return a null reference.

When the value is true, these methods will use the same image reference to create the page. Calling IOcrPage.GetRasterImage(OcrPageType.Original) on this page will return the original image reference.

Leadtools.Forms.Ocr.IOcrAutoRecognizeManager will temporarily set the value of this setting to True while performing a recognition job.

Recognition.ModifyProcessingImage Boolean N/A

True to modify the processing image after recognition, otherwise; False.

It is recommended to set the value of this setting to True if IOcrPage.Recognize is called only once per page.

Leadtools.Forms.Ocr.IOcrAutoRecognizeManager will temporarily set the value of this setting to True while performing a recognition job.

Recognition.Threading BeginCategory N/A Beginning of the recognition thread settings category
Recognition.Threading.MaximumThreads Integer 0 to Int32.MaxValue

Gets or sets the maximum number of threads to use in recognition. The LEADTOOLS Advantage OCR engine provides support for recognizing document zones in separate threads. This can improve the performance of the IOcrPage.Recognize method.

The default value of 0 (zero) instructs LEADTOOLS to use the maximum number of threads equal to the number of cores available in the current machine.

A value of 1, 2, 3, 4, etc. instructs LEADTOOLS to use that maximum number of threads equal to that number. If you do not wish to use multi-threading inside the IOcrPage.Recognize method then set the value of the Recognition.Threading.MaximumThreads to 1.

Note that if you set the value of this setting to something other than 1 (disable multi-threading) and the value of the Recognition.Threading.UseThreadPool is set to True, then the .NET Thread Pool will be used instead of regular Win32 threads, and hence, the actual number of worker threads will be a dynamic value determined by the Thread Pool depending on current system workload and number of free cores.

Recognition.Threading.UseThreadPool Boolean N/A

True to use the .NET ThreadPool, otherwise use Win32 threads. Note that the value of this property is used only if the value of Recognition.Threading.MaximumThreads is set to a value other than 1 (disable multi-threading).

End:Recognition.Threading EndCategory N/A End of the recognition thread settings category.
Recognition.PreProcessing BeginCategory N/A Beginning of the pre-processing settings category
Recognition.Preprocess.BlackWhiteImageConversionMethod Enum Default, Dynamic, User

This setting will influence how a non-B/W image, stored in the Engine, will be converted to a B/W one.

Default: This affects grayscale or 24-bit color images, a B/W image will be created in the Engine's memory. Image binarization applies an automatic adaptive thresholding algorithm.

Dynamic: This affects grayscale or 24-bit color images, a B/W image will be created in the Engine's memory. Each pixel is compared to a dynamically-calculated threshold, if the pixel intesity is higher it is set to white otherwise it is set to black.

User: This affects grayscale or 24-bit color images, a B/W image will be created in the Engine's memory. Thresholding with a user-defined threshold value, set by the Recognition.Preprocess.BlackWhiteImageConversionThreshold setting.

Recognition.Preprocess.BlackWhiteImageConversionThreshold Integer 0 to 255

The threshold to use when converting colored images to bitonal (black/white) in preparation to recognizing the text on the image. The conversion is done to separate the text intensities from the background intensities.

This is the equivalant of calling IntensityDetectCommand on the image with InColor equals to the detected foreground (text) color, OutColor equals to the detected background color, Channel to "Master", HighThreshold equals to 255, and LowThreshold equals to the value of this setting. Default value is 185.

Recognition.Preprocess.MobileImagePreprocess Boolean N/A

True to enable mobile image processing mode, otherwise; false. By default, the OCR engine will try to upscale images with a low resolution (DPI). However, in most mobile devices, the camera will take a picture with a low resolution (for example, 72 DPI) and a large size in pixels. Therefore, having the OCR engine upscale the images will result in undesired consumption of memory. If you are using the OCR engine to process images from a mobile camera, set the value of this setting to false.

Recognition.Preprocess.DownSampleLargeImage Boolean N/A

True to down sample large images prior to recognition, otherwise; false. Set the value of this setting to true to force the OCR engine to not create processing images (the image used for recognition) larger than 4000 by 4000 pixels to preserve memory and resources. This value is ignored if the value of the MobileImagePreprocess setting is true.

Recognition.Preprocess.UseZoningEngine Boolean N/A

True to use the zoning engine to exclude graphics areas from preprocessing calculations such as deskew and auto-rotate. Otherwise; false.

Recognition.Preprocess.MinimumAutoRotateConfidence Integer 0 to 100

Used by IOcrPage.AutoPreprocess to determine the minimum confidence percentage threshold to use when orienting pages. Default value is 26.

End:Recognition.PreProcessing EndCategory N/A End of the pre-processing settings category.
Recognition.Zoning BeginCategory N/A Beginning of the zoning settings category.
Recognition.Zoning.DisableMultiThreading Boolean N/A

True to disable multi-threading when performing auto-zoning; otherwise multi-threading is enabled. Multi-threading enhances the performance of the auto-zoning algorithm. However, it may be undesirable if the OCR engine is hosted in a server.

Recognition.Zoning.CropZoneImage Boolean N/A

If this flag is set to true then the Advantage engine will crop each zone from the original image and recognize it. This can improve the performance of the IOcrPage.Recognize method.

Recognition.Zoning.Options Enum None, Detect Text, Detect Graphics, Detect Table, Allow Ovelap, Detect Accurate Zones, Use Text Extractor, Detect Checkbox These flags affect the way the IOcrPage.AutoZone method works. Values can be OR-ed. Possible values are:
Value Meaning
None If this is the only flag that is set, the engine will use default values to perform auto zoning.
Detect Text Search for text zones inside the page image.
Detect Graphics Search for graphic zones inside the page image.
Detect Table Search for table zones inside the page image.
Allow Ovelap Allow zones to overlap; otherwise detected zones will not overlap.
Detect Accurate Zones Detect smaller and more accurate zones (like page paragraphs). Unless this flag is set the auto zone algorithm will try to detect the largest possible zones.
Recognize One Cell Table Detect tables that only has one cell as tables. Must be OR'ed with "Detect Table"
Table Cells as Zones Treat each cell detected inside a table as its own zone. If this option is set, the zone types will be OcrZoneType.Text instead of OcrZoneType.Table. Must be OR'ed with "Detect Table".
Use Advanced Table Detection Use advanced table detection for most accurate results when the document contains tables. This option will recursively and aggressively parse the document to locate the tables and cells position. Using this option will generate the most accurate representation of the original document and its tables in the final output. This option must be OR'ed with "Detect Table".
Use Text Extractor Improves text zone recognition, extracting text by separating graphics and tables from text areas.
Detect Checkbox Search for checkbox zones inside the page image.
Recognition.Zoning.EnableDoubleZoning Boolean N/A

If this flag is set to true then the Advantage engine will perform a second internal autozoning on each text zone to generate more homogenuous zones for recognition. This can improve the performance of the IOcrPage.Recognize method.

End:Recognition.Zoning EndCategory N/A End of the zoning settings category
Recognition.Words BeginCategory N/A Beginning of the word recognition settings category
Recognition.Words.DiscardLowConfidenceWords Boolean N/A This setting controls the output. If true, words/characters with a low rating (rubbish words/characters) will not be included when saving the recognition results to any of LEADTOOLS supported document formats.
Recognition.Words.LowWordConfidence Integer 0 to 100

Discard any word with a confidence value less than this value. This setting only takes effect when DiscardLowConfidenceWords is set to true.

End:Recognition.Words EndCategory N/A End of the words recognition settings category.
Recognition.Adaption BeginCategory N/A Beginning of the recognition adaption settings category.
Recognition.Adaption.AdaptedDataFilePath Boolean N/A

Enables/disables learning of the Advantage OCR engine. When true, all character features are saved to a file after each recognition. At the next recognition this file is loaded in order to use the saved adapted character features, resulting in better recognition results.

Notes:

1- When performing form recognition, it is recommended that you turn this feature off. This way you compare the form against the original results.

2- When performing form processing, it is recommended you turn this feature on. This way, since the forms are identical, you get better OCR results with subsequent forms.

End:Recognition.Adaption EndCategory N/A End of the recognition adaption settings category.
Recognition.CharacterFilter BeginCategory N/A Beginning of the recognition character filters category.
Recognition.CharacterFilter.MinimumPixelWidth Integer 0 to Int32.MaxValue

Minimum width of a recognized character in pixels.

Recognition.CharacterFilter.MinimumPixelHeight Integer 0 to Int32.MaxValue

Minimum height of a recognized character in pixels.

Recognition.CharacterFilter.MinimumPixelSizeExcludeCharacters String No maximum and can be null

Characters to exclude from the minimum pixel width and height rule.

Recognition.CharacterFilter.DiscardNoiseLikeCharacters Boolean N/A

Ignore recognized characters that have features similar to noise.

End:Recognition.CharacterFilter EndCategory N/A End of the recognition character filters category.
Recognition.Fonts BeginCategory N/A Beginning of the fonts category.
Recognition.Fonts.EnableCapsCaps Boolean N/A

Enable Caps/Caps font recognition enhancements.

Recognition.Fonts.DetectFontStyles Enum None, Bold, Italic, Underline, SansSerif, Serif, Proportional, Superscript, Subscript, Strikeout Enable or disable the detection of specific font properties. These flags affect the final generated document if the format supports fonts such as PDF or DOC. Values can be OR-ed. Possible values are:
Value Meaning
None Do not detect any font styles.
Bold Detect bold font style.
Italic Detect italic font style.
Underline Detect underline font style.
SansSerif Detect Sans-Serif font style (such as Times New Roman).
Serif Detect Serif font style (such as Arial).
Proportional Detect proportional font style (such as Times New Roman or Arial) or unproportional font style (such as Courier New).
Superscript Detect super-script font style.
Subscript Detect subscript font style.
Strikeout Detect strikeout font style.
Recognition.Fonts.RecognizeFontAttributes Boolean N/A

Enable font attributes recognition. Disabling it can improve the speed of the IOcrPage.Recognize method.

End:Recognition.Fonts EndCategory N/A End of the fonts category.
Recognition.AutoRecognizeManager BeginCategory N/A Beginning of the auto-recognize manager category.
Recognition.AutoRecognizeManager.FormatSpeedOptimized Boolean N/A

Enable optimizing the recognition speed based on the final document format. For example, the OCR engine will not recognize font attributes such italic or bold if the final format is Text.

Recognition.AutoRecognizeManager.DefaultDocumentOrientation Enum None, Portrait, Landscape

Default orientation for the generated document if a page is blank or graphics only. Possible values are:

Value Meaning
None Do change the orientation.
Portrait Try to change into portrait (make the width less than height) if the page is empty or contains graphics only.
Landscape Try to change into landscape (make the width greater than height) if the page is empty or contains graphics only.
End:Recognition.AutoRecognizeManager EndCategory N/A End of the auto-recognize manager category.
End:Recognition EndCategory N/A End of the recognition settings category.

Products | Support | Contact Us | Copyright Notices
© 2006-2014 All Rights Reserved. LEAD Technologies, Inc.