Send comments on this topic. | Back to Introduction - All Topics | Help Version 15.12.21
Recognizing Document Pages

Each zone on a page has a recognition module associated with it through the RasterDocumentZoneData.RecognizeModule object. This recognition module provides information about the type of information contained in the zone and how to recognize that data. Depending on the type of recognition module, there may be additional options available for use during recognition. For example, if a zone is associated with a Multi-lingual Omni font Recognition module (MOR), then other recognition options for this module can be set and get using the MorEnableFaxMode property.

Similarly, if a zone is associated with a Hand Printed Numeral Recognition module, then other recognition options can be set using the HandPrintOptions property. If the zone is associated with an Optical Mark Recognition module (OMR), other recognition options can be set using the OmrOptions property

For some general information about available recognition modules, refer to An Overview of Recognition Modules

Depending on the type of recognition module associated with a zone, it may be beneficial to trade-off between the accuracy of recognition and the speed of recognition. Using the RecognizeModuleTradeoff property you can tell the OCR engine to perform the most accurate recognition, the fastest recognition, or a balanced recognition. To get the current trade-off setting for the OCR engine, check the RecognizeModuleTradeoff property

Use the following properties before starting the recognition process:

SpellLanguageId

EnableCorrection

EnableSubsystem

The EnableSubsystem property and the EnableCorrection property will be used to enable or disable the checking sub-system, which will be used in verification.

When all necessary recognition options have been set, the page(s) can be recognized by calling Recognize

Call the EnableEvents method to enable calling the RasterDocumentRecognizeStatusCallback callback. To stop firing the recognition status callback, call the DisableEvents method

After recognition is complete, the recognized characters can be obtained and the recognition results can be saved to a file or to memory.

The collection of characters recognized for a specific page can be obtained using GetRecognizedCharacters To add any characters to this collection of recognized characters, call SetRecognizedCharacters

The recognition results can be saved to a file by calling SaveResultsToFile The type of material exported to a file, the method in which the material is stored and the file type in which it is stored can all be controlled using SaveResultOptions property

When saving recognition results to a file, you can use the AvailableOutputFileFormats method to obtain all available output file formats supported by the OCR engine. To get specific information about a particular output file format, call GetTextFormatInfo

The recognition results can also be saved to memory by calling SaveResultsToMemory

To get or set special characters used in the recognition process, use the SpecialRejectedCharacter property

Finally, to get the status of the OCR engine at any time, use the RasterDocumentRecognizeStatusCallback