Working with OCR Pages

After starting the OCR engine, you can begin working with the document page(s). Pages can work with or without an OCR document.

The LEADTOOLS OCR methods provide support for the following tasks when working with OCR pages:

Creating one page from a BITMAPHANDLE directly without an OCR document.
Creating one or more OCR documents.
Adding pages to or removing pages from an OCR document (removing OCR pages is only available when working with memory-based OCR documents).
Getting information about one or more pages.
Updating pages.

As described in Programming with LEADTOOLS OCR Advantage, an L_OcrPage handle can be created directly using the L_OcrPage_FromBitmap method without using an L_OcrDocument handle. These pages can be zoned and recognized and the OCR results can be obtained directly using L_OcrPage_GetText or L_OcrPage_GetRecognizedCharacters.

If the pages are to be saved to a final document such as PDF or DOCX, then an L_OcrDocument handle is required.

An instance of L_OcrDocument handle contains the pages of a document. You can create a new OCR document by calling the L_OcrDocumentManager_CreateDocument method. This method allows creation of memory-based or file-based documents.

Each OCR document can have one or more pages (L_OcrPage handles). L_OcrDocumentManager provides some functionality that you can use to access OCR document pages.

L_OcrDocument holds a list of L_OcrPage handles. Each of these L_OcrPage handles contains the bitmap handle used to create it (the bitmap used when the page is loaded or added) and a group of OCR zones for the page (either added manually or through auto-zoning).

L_OcrDocument provides some functionality to add, remove, get, set and iterate through the different pages of the document if the document is memory-based.

Adding a page to file-based document involves taking a snap shot of the current recognition data and storing it internally. The page itself is not added to the internal pages list and is not required to stay in memory. The user can only add new pages and not remove or iterate through them.

The following list contains the major functionality of L_OcrDocument:

Adding a new page to the document from the bitmap handle. The following table lists all OCR document page addition method groups:

<table> 
<thead> 
<tr class="header"> 
<th>Value</th> 
<th>Description</th> 
</tr> 
</thead> 
<tbody> 
<tr> 
<td><a href="L-OcrDocument-AddPage.md">L_OcrDocument_AddPage</a></td> 
<td>Adds the recognition data of an <a href="L-OcrPage.md">L_OcrPage</a> to a document. Works for both file-based and memory-based documents.</td> 
</tr> 
<tr> 
<td><a href="L-OcrDocument-InsertPage.md">L_OcrDocument_InsertPage</a></td> 
<td>Inserts the recognition data of an <a href="L-OcrPage.md">L_OcrPage</a> into a specific location on a single page ofa document. Works only on memory-based documents.</td> 
</tr> 
</tbody> 
</table>

Performing a high level "fire and forget" approach to OCR recognition, with multiple-threads support through L_OcrAutoRecognizeManager Recognizing single or multiple files and saving recognition results to a file with a single function by calling the L_OcrAutoRecognizeManager_Run, L_OcrAutoRecognizeManager_RunJob or L_OcrAutoRecognizeManager_RunJobAsync methods.

Help Version 19.0.2017.10.27

Products | Support | Contact Us | Copyright Notices
© 1991-2017 Apryse Sofware Corp. All Rights Reserved.

LEADTOOLS Advantage OCR C API Help

Introduction

Version History

Redistributables/Files To Be Included With Your Application

LEADTOOLS OCR Features

Quick Reference

Tutorials

Function References