OCR Engine and RasterCodecs/DocumentWriter Usage

During its lifetime, the OCR engine requires the use of a RasterCodecs object to load input raster images and a DocumentWriter object to create output documents.

Initially, pass RasterCodecs and/or DocumentWriter instances to the IOcrEngineStartup method. This is considered the template for any subsequent RasterCodecs and DocumentWriter instances created by the engine for the various operations as described below.

If a null reference is passed as the RasterCodecs object to IOcrEngineStartup, the OCR engine creates a RasterCodecs object during the initialization process that can be accessed later through IOcrEngine.RasterCodecsInstance. When the OCR engine creates this instance, it changes the following from their default values:

rasterCodecsInstance.Options.RasterizeDocument.Load.XResolution = 300 
rasterCodecsInstance.Options.RasterizeDocument.Load.YResolution = 300 
rasterCodecsInstance.Options.Pdf.Load.DisplayDepth = 0 

These values are changed from their default values to enable loading document file formats such as PDF, XPS, DOC DOCX(2007/2010), XLS, XLSX(2007/2010) and HTML that do not contain physical size at a resolution suitable for recognition. For more information, refer to CodecsRasterizeDocumentLoadOptions.XResolution, CodecsRasterizeDocumentLoadOptions.YResolution and CodecsPdfLoadOptions.DisplayDepth.

If your own RasterCodecs object is passed to IOcrEngineStartup, it is best to set the same parameters as above if the application requires loading of such document formats.

If a null reference is passed as the DocumentWriter object to IOcrEngineStartup, the OCR engine also creates a DocumentWriter object during the initialization process that can be accessed later through IOcrEngine.DocumentWriterInstance. Unlike the RasterCodecs instance, this initial object has all options set to their default values.

Since the LEADTOOLS OCR engine supports multi-threaded documents, each IOcrDocument created requires its own instance of RasterCodecs and DocumentWriter to load images and create document files. These objects are created automatically when the IOcrDocument is created through IOcrDocumentManager.CreateDocument and are accessed through IOcrDocument.RasterCodecsInstance and IOcrDocument.DocumentWriterInstance respectively.

It is best to use these instances instead of IOcrEngine instances when loading images to an IOcrDocument or manually using the DocumentWriter instance to change options or convert LTD files especially in multi-threading situations.

Initially, these objects contain the same options as IOcrEngine.RasterCodecsInstance and IOcrEngine.DocumentWriterInstance. The IOcrDocument.UseEngineInstanceOptions property controls whether the document must re-get these options before usage and its default value is true making it possible to easily change RasterCodecs and DocumentWriter options globally for all current and future IOcrDocument's. As an example, suppose it is necessary to save one IOcrDocument object to PDF with Image/Text option and another to save to PDF/A, all in the same IOcrEngine. Simply set the IOcrDocument.UseEngineInstanceOptions for each document to false, then change the PDF options through each document's IOcrDocument.DocumentWriterInstance instead of using IOcrEngine.DocumentWriterInstance.

Help Version 23.0.2024.3.4
Products | Support | Contact Us | Intellectual Property Notices
© 1991-2024 LEAD Technologies, Inc. All Rights Reserved.

LEADTOOLS Imaging, Medical, and Document

Products | Support | Contact Us | Intellectual Property Notices
© 1991-2023 LEAD Technologies, Inc. All Rights Reserved.