Welcome Guest! To enable all features, please Login or Register.

Notification

Icon
Error

Options
View
Last Go to last post Unread Go to first unread post
#1 Posted : Monday, February 22, 2010 8:36:59 PM(UTC)

dimak  
dimak

Groups: Registered
Posts: 4


Hi,
I have problems when trying to parse OCR results from Table type Zone.
My goal is build table with columns and rows.
Here is what I got:

1. When using OCR Plus engine each last character of the word/cell has following Position information:
OcrCharacterPosition.EndOfLine | OcrCharacterPosition.EndOfParagraph | OcrCharacterPosition.EndOfWord | OcrCharacterPosition.EndOfCell
This does not allow me even differentiate lines in the table.

2. When using OCR Professional engine each last character of the word/cell has following Position information:
OcrCharacterPosition.EndOfLine | OcrCharacterPosition.EndOfWord | OcrCharacterPosition.EndOfCell

As with OCR Plus engine - useless information to build the table.
Do you have any ideas to solve the problem?

Thanks in advance
 

Try the latest version of LEADTOOLS for free for 60 days by downloading the evaluation: https://www.leadtools.com/downloads

Wanna join the discussion? Login to your LEADTOOLS Support accountor Register a new forum account.

#2 Posted : Tuesday, February 23, 2010 4:20:40 AM(UTC)

Adnan Ismail  
Guest

Groups: Guests
Posts: 3,022

Was thanked: 2 time(s) in 2 post(s)

Do you have a sample file that you tried to perform OCR on it and draw a table from it?
If yes, please send it to us and explain what type of attributes you want to detect and we will try to tell you if that's possible or no.

If you want to submit an attachment, put it in a ZIP or RAR file and don't use the Preview feature. You can also send it in an email to support@leadtools.com and mention this forum post.
 
#3 Posted : Tuesday, February 23, 2010 5:38:31 AM(UTC)

dimak  
dimak

Groups: Registered
Posts: 4


Hi,
here is a test tif and zones files I use.
Only one zone - ZoneType=Table

Thanks in advance
File Attachment(s):
lead_test.rar (26kb) downloaded 23 time(s).
 
#4 Posted : Wednesday, February 24, 2010 6:14:57 AM(UTC)

Adnan Ismail  
Guest

Groups: Guests
Posts: 3,022

Was thanked: 2 time(s) in 2 post(s)

The image you sent me does not have a constructed table in it. Our engines will recognize these numbers as numeric values and not as part of a table because there are no grid-lines.

If your table lines are not drawn on the image, you can find if words are aligned by comparing the locations of end of word characters using the OcrCharacter.Bounds member. If they are almost equal in the horizontal direction but different in the vertical direction, they will be aligned below each other.

I modified your image by drawing a table on it then performed OCR on it. I am attaching the image and the resulting word document I got.

File Attachment(s):
TestImage.zip (26kb) downloaded 24 time(s).
 
#5 Posted : Wednesday, February 24, 2010 7:33:57 AM(UTC)

dimak  
dimak

Groups: Registered
Posts: 4


Thanks for the info,
I see that is require to have lines in order to retrieve data from table.

In most cases I have to deal with images that are result of template removal (dropped by scanner on color base). And as with some other OCR engines I use, I have to go back to manual reconstruction of the table based coordinates.

Thanks for your time.
 
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.

Powered by YAF.NET | YAF.NET © 2003-2024, Yet Another Forum.NET
This page was generated in 0.105 seconds.