Recognizing Document Pages

Each zone on a page has a recognition module associated with it through the ZONEDATA2.RecogModule member. This recognition module provides information about the type of information contained in the zone and how to recognize that data. Depending on the type of recognition module, there may be additional options available for use during recognition.

After recognition is complete, the recognized characters can be obtained and the recognition results can be saved to a file or to memory.

For more information refer to:

An Overview of Recognition Modules

Drawing Pages and Zones

Working with Pages

Working with Zones

OCR Professional

For example, if a zone is associated with a Multi-lingual Omnifont Recognition module (MOR), then other recognition options for this module can be set using L_Doc2SetMOROptions. To get the current MOR options, use L_Doc2GetMOROptions.

Similarly, if a zone is associated with a Hand Printed Numeral Recognition module (ICR-HNR), then other recognition options can be set using L_Doc2SetHandPrintOptions or retrieved using L_Doc2GetHandPrintOptions. If the zone is associated with an Optical Mark Recognition module (OMR), other recognition options can be set using L_Doc2SetOMROptions and retrieved using L_Doc2GetOMROptions.

For some general information about available recognition modules, refer to An Overview of Recognition Modules.

Depending on the type of recognition module associated with a zone, it may be beneficial to trade-off between the accuracy of recognition and the speed of recognition. Using the L_Doc2SetRecognizeModuleTradeOff function you can tell the OCR engine to perform the most accurate recognition, the fastest recognition, or a balanced recognition. To get the current trade-off setting for the OCR engine, call L_Doc2GetRecognizeModuleTradeOff.

To apply the pre-processing functions to enhance a document (rotate, invert, brighten, control the threshold or changing the binarization mode), call the L_Doc2GetPreProcessingOptions function to the get the current engine's pre-processing options. To update these options, call the L_Doc2SetPreProcessingOptions function. Notice that calling of L_Doc2SetPreProcessingOptions function should precede the calling of L_Doc2FindZones or L_Doc2Recognize functions because the pre-processing options affect the auto-zoning and recognition results.

When all necessary recognition options have been set, the page(s) can be recognized by calling L_Doc2Recognize. To get information on the status of the recognition process during recognition, pass a valid pointer to a RECOGNIZESTATUSCALLBACK2 function to the L_Doc2Recognize function.

After recognition is complete, the recognized characters can be obtained and the recognition results can be saved to a file or to memory.

The collection of characters recognized for a specific page can be obtained using L_Doc2GetRecognizedCharacters. To add any characters to this collection of recognized characters, call L_Doc2SetRecognizedCharacters. When this collection of recognized characters is no longer needed, it should be freed by calling L_Doc2FreeRecognizedCharacters.

L_Doc2GetRecognizedCharacters will fill color indexes in RECOGCHARS2.nFGColorIndex and RECOGCHARS2.nBGColorIndex, so to get the associated colors for these indexes, first get the colors table, and then get the color. To get the colors table, call L_Doc2GetRecognizedCharactersColors.

If you wish to get characters choices after a previous call to L_Doc2GetRecognizedCharacters you can call the L_Doc2GetCharacterChoices function and to free the allocated memory by this function you should call L_Doc2FreeCharacterChoices function.

Once the characters for a specific page have been determined using L_Doc2GetRecognizedCharacters, L_Doc2GetRecognizedWords can be called to combine the recognized characters into words. To change the contents of the recognized words, change the set of recognized characters by calling L_Doc2SetRecognizedCharacters. To save the updated recognized characters to a file, call L_Doc2SaveResultsToFile or L_Doc2SaveResultsToFile2. When the collection of recognized words is no longer needed, it should be freed by calling L_Doc2FreeRecognizedWords.

If you wish to get words suggestions after a previous call to L_Doc2GetRecognizedWords you can call the L_Doc2GetWordSuggestions function and to free the allocated memory by this function you should call L_Doc2FreeWordSuggestions function.

Use L_Doc2SaveResultsToFile2 when you want to save the recognition results to different formats using the same recognition results while maintaining quality. Use L_Doc2SaveResultsToFile when memory is a constraint (it uses less memory than L_Doc2SaveResultsToFile2). However, L_Doc2SaveResultsToFile requires OCR to be performed separately for each file format in order to maintain quality.

Call L_Doc2GetSupportedEngineFormats to obtain a list of all supported native engine formats. Retrieve the friendly name for each of the retrieved formats by calling L_Doc2GetEngineFormatFriendlyName. Call L_Doc2FreeEngineFormats when the format list returned by the L_Doc2GetSupportedEngineFormats function is no longer needed.

For List of all supported native engine formats you can call L_Doc2GetSupportedEngineFormats and you can retrieve a friendly name for each of the retrieved formats by calling L_Doc2GetEngineFormatFriendlyName. call L_Doc2FreeEngineFormats when you no longer need formats list returned by the L_Doc2GetSupportedEngineFormats function.

The recognition results can be saved to a file by calling L_Doc2SaveResultsToFile or L_Doc2SaveResultsToFile2. The type of material exported to a file, the method in which the material is stored and the file type in which it is stored can all be controlled using L_Doc2SetRecognitionResultOptions. To get the current recognition results settings, call L_Doc2GetRecognitionResultOptions.

When calling L_Doc2SetRecognitionResultOptions, you will specify the document writer format, so to set the options for document writer format, call L_Doc2SetDocumentWriterOptions, also, to get the set options for the document writer format, call L_Doc2GetDocumentWriterOptions.

The OCR engine supports different settings for each output format. These settings will affect the output file. To get specific format settings, call L_Doc2GetOutputFormatSettings. Update the settings, then call L_Doc2SetOutputFormatSettings. Call L_Doc2SaveResultsToFile function to save the output results to a file.

When saving recognition results to a file, you can use L_Doc2EnumOutputFileFormats to enumerate all available output file formats supported by the OCR engine. This function will report each file format to an ENUMOUTPUTFILEFORMATS2 callback function. To get specific information about a particular output file format, call L_Doc2GetTextFormatInfo.

To get or set special characters used in the recognition process, use L_Doc2GetSpecialChar and L_Doc2SetSpecialChar.

Finally, to get the status of the OCR engine at any time, use L_Doc2GetStatus.

Help Version 19.0.2017.10.27

LEADTOOLS Professional OCR C API Help

Introduction

Version History

Getting Started with LEADTOOLS Character Recognition

Redistributables/Files To Be Included With Your Application

LEADTOOLS OCR Professional Features

LEADTOOLS ICR Add-On

LEADTOOLS OMR Add-On

LEADTOOLS PDF OCR Add-On

Quick Reference

Tutorials

Function References