OCR Frequently Asked Questions

Q. How do I define multiple zones on an image, OCR them all at once, and get the results for each zone separately so that I can save it to a database?

A. There is an example in the OCR demo. You can open the project to see the source code for the demo. To use the demo:

Load an image.
Click Page | Insert Current Page.
Either click and drag to create zones manually then click OCR | Recognize Page..., or call OCR | Recognize Page.... This will OCR all zones on the page.
Once data is recognized, click OCR | Get Recognized Words to retrieve each word from each zone.

To get results for each zone separately, include only one zone at recognition time. If you have more than one zone, the save results function will save all zone results to one file. To get each zone result, call L_Doc2GetRecognizedCharacters / L_Doc2GetRecognizedCharactersExt (OCR Pro). The ppRecogChars will be updated, and you can collect all characters from same zone while ppRecogChars->nZoneIndex member is not changed.

Q. How do I define, load, and save templates for different types of forms?

A. There is an example in the OCR demo. You can open the project to see the source code for the demo. To use the demo:

Load an image.
Click Page | Insert Current Page.
Either click and drag to create zones manually or call Zone | Find Zones .... This will Create zones on the page automatically.
Once the image is zoned, click Zone | Export Zone File... to save the zones to file, or Zone | Load Zone File... to load the zones from disk.

Q. How do I support various European languages?

A. There is an example in the OCR demo. You can open the project to see the source code for the demo. To use the demo:

Choose Language | Select Languages.

Q. How do I get a confidence value for each character that is recognized?

A. Refer to the following topic on the LEADTOOLS Support Forum:

https://www.leadtools.com/support/forum/posts/m8966-Postprocessing-in-OCR

Call L_Doc2GetRecognizedCharacters / L_Doc2GetRecognizedCharactersExt (OCR Pro), and check the nConfidence member in the RECOGCHARS2 (OCR Pro) structure for each character.

Q. How do I filter the OCR recognition results to eliminate false characters and increase accuracy?

A. You can filter out false positives by setting the character filter in the ZONEDATA2 (OCR Pro) structure. For example, if you wish to recognize only numbers, you would set the character filter to recognize numbers only as follows:

ZONEDATA2 ZoneData;   
memset(&ZoneData, 0, sizeof(ZONEDATA2));   
ZoneData.uStructSize = sizeof(ZONEDATA2);   
ZoneData.rcArea.left = 100;   
ZoneData.rcArea.top = 100;   
ZoneData.rcArea.right = 200;   
ZoneData.rcArea.bottom = 200;   
ZoneData.FillMethod = DOC2_FILL_DEFAULT;   
ZoneData.RecogModule = DOC2_RECOGMODULE_AUTO;   
ZoneData.CharFilter = DOC2_ZONE_CHAR_FILTER_NUMBERS;   
ZoneData.Type = DOC2_ZONE_FLOWTEXT;   
ZoneData.uFlags = 0;   
ZoneData.pfnCallback = VerificationCB;   
ZoneData.pUserData = NULL;   
nRet = L_Doc2AddZone(hDoc, nPageIndex, 0, &ZoneData);

Call L_Doc2UpdateZone / L_Doc2UpdateZoneExt (OCR Pro) to update available zones in a specific page.

Q. How do I get the co-ordinates of each word recognized, so that I can locate each recognized word on the image?

A. Refer to the following topic on the LEADTOOLS Support Forum:

https://www.leadtools.com/support/forum/posts/m2546-Find-words-in-document-and-highlight-with-annotations

Call L_Doc2GetRecognizedCharacters / L_Doc2GetRecognizedCharactersExt (OCR Pro) to get each recognized character. Call L_Doc2GetRecognizedWords / L_Doc2GetRecognizedWordsExt (OCR Pro) to get a list of all recognized words.

Q. How do I output OCR results to memory?

A. There is an example in the OCR Memory demo. You can open the project to see the source code for the demo. The OCRMem demo will save recognition results to memory. To use the demo:

Run the OCR Memory demo
Click File | Open menu, and select your file to open
Click OCR | Add Page menu, to add loaded image to internal OCR document pages.
Click OCR | Recognize menu, to recognize the added page.
Click OCR | Save Results to Memory, then the demo will show all recognition results in a message box.

Q. How do I output OCR results to XML?

A. There is an example in the OCR demo. You can open the project to see the source code for the demo. To use the demo:

Load an image.
Click Page | Insert Current Page.
Either click and drag to create zones manually then click OCR | Recognize Page..., or call OCR | Recognize Page.... This will OCR all zones on the page.
Once data is recognized, click OCR | Save Results. Select XML in the File Formats ComboBox, choose a file name, then click Ok.

Q. How do I output OCR results to image-over-text PDF?

A. There is an example in the OCR File demo. This demo saves to many different formats, one of which is image-over-text PDF. You can open the project to see the source code for the demo.

The OCR File demo saves the recognition result to all supported output formats. Choose the desired output format.

Q. How do I add user-defined words to the recognition library?

A. Refer to: Working with a Dictionary

Q. How do I recognize magnetic ink (MICR) characters on bank check images?

A. There is an example in the MICR demo. Open the demo project to see the source code for the demo. To use the demo:

Run the MICR demo
Make sure that the zone coordinates that will be added to the page are based on "MICR_SAMPLE.tif" (shipped with the LEADTOOLS setup).
When loading other images, update the zone coordinates and build the demo again.