Welcome Guest! To enable all features, please Login or Register.

Notification

Icon
Error

Options
View
Last Go to last post Unread Go to first unread post
#1 Posted : Sunday, January 28, 2018 8:37:22 PM(UTC)

pavlo  
pavlo

Groups: Registered
Posts: 1


Summary of Problem: I'm getting inconsistent OCR results for certain characters. For example, a PDF may contain string (ignore quotes), "DOCUMENT00641", and sometimes the "6" is recognized as a "6" while other times it is recognized as "G"
Goal: In the above example, I'd like to always have that "6" recognized as "6"

Source PDFs: Generated from Outlook Emails, so the quality is pretty good.

Current Settings:
1) Document Resolution is 300
2) ocrDocument.Engine.SettingManager.SetEnumValue("Recognition.RecognitionModuleTradeoff", "Accurate");
3) textOptions.Formatted = true;

More examples of inconsistency:
1) DOCUMENT00673 recognized as DOCUMENT00673 [Note: This is the desired result]
2) DOCUMENT00674 recognized as DOCUMENT00G74 [Note: NOT desired]
3) DOCUMENT00676 recognized as DOCUMENT00676 [Note: This is the desired result]
4) DOCUMENT00677 recognized as DOCUMENT00G77 [Note: This is the desired result]
5) DOCUMENT00700 through DOCUMENT00799 are accurately recognized [Note: This is the desired result]

So it appears that in the 600-range, the OCR Engine is unsure whether the hundredth digit "6" is alpha or numeric. Is it possible to configure this so it should lean towards recognizing that character as a number? The strings I've provided as examples are typically at the bottom of each page.
 

Try the latest version of LEADTOOLS for free for 60 days by downloading the evaluation: https://www.leadtools.com/downloads

Wanna join the discussion? Login to your LEADTOOLS Support accountor Register a new forum account.

#2 Posted : Wednesday, January 31, 2018 3:49:23 PM(UTC)

Joe Z  
Joe Z

Groups: Registered, Tech Support, Administrators
Posts: 63

Thanks: 2 times
Was thanked: 4 time(s) in 4 post(s)

The Leadtools OCR SDK only supports specifically defining the types of characters you wish to recognize via the CharactersFilters property.

https://www.leadtools.com/help/leadtools/v19/dh/fo/ocrzone-characterfilters.html

Since your strings contain both letters and digits, one option you could implement would be to create multiple zones for your string. By doing so, you could specify the first zone to only recognize letters and for the second zone to only recognize numbers/digits. After the recognition process, you could combine the two results.

With this in mind, which OCR engine are you using for this process? Occasionally our two engines will show different results depending on the input file.

Additionally, are you able to provide a sample file that we could look into? If you wish to send us a sample, please email us at support@leadtools.com and not post the file on our forums.
Joe Zhan
Developer Support Engineer
LEAD Technologies, Inc.

LEAD Logo
 
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.

Powered by YAF.NET | YAF.NET © 2003-2024, Yet Another Forum.NET
This page was generated in 0.068 seconds.