Welcome Guest! To enable all features, please Login or Register.

Notification

Icon
Error

Options
View
Last Go to last post Unread Go to first unread post
#1 Posted : Monday, April 14, 2008 11:22:12 AM(UTC)

bsuresh  
bsuresh

Groups: Registered
Posts: 32


Hi Adnan,

We have a table on a page and table's column header is spread into three lines. For example, "Last Action Date" is the column header and due to lack of room, the column header is wrapped (in the same cell) into 3 lines as below:
Last
Action
Date

How can we make sure that these three words are returned in the same order when I perform ocr on the page? We are not getting these three words in same order. We are getting the text from other column headers in between ..

Thanks
 

Try the latest version of LEADTOOLS for free for 60 days by downloading the evaluation: https://www.leadtools.com/downloads

Wanna join the discussion? Login to your LEADTOOLS Support accountor Register a new forum account.

#2 Posted : Monday, April 14, 2008 11:25:12 AM(UTC)

bsuresh  
bsuresh

Groups: Registered
Posts: 32


Please check attachment.
bsuresh attached the following image(s):
ocr - words - order - table.JPG
 
#3 Posted : Tuesday, April 15, 2008 3:25:37 AM(UTC)

Adnan Ismail  
Guest

Groups: Guests
Posts: 3,022

Was thanked: 2 time(s) in 2 post(s)

You can define a Zone on each column header, and perform OCR on that Zone. The details to do this depends on what version of LEADTOOLS are you using and which LEADTOOLS programming interfaces you are using (API, COM Objects, or .NET Class library) to develop your application.

 
#4 Posted : Tuesday, April 15, 2008 6:59:40 AM(UTC)

bsuresh  
bsuresh

Groups: Registered
Posts: 32


I am using 15 SDK with C#.

Unfortunately I cannot know whether a document will contain table or other data. All I want to acheive is to be able to get the list of words on the page in the correct order. This is the requirement. So for some pages that contain the table, the order of words is not corred (as in the above case, though the text belongs to the same column, they are not being reported together because the text is wrapped within the column header). How can we get this.
 
#5 Posted : Thursday, April 17, 2008 2:45:26 AM(UTC)

Qasem Lubani  
Guest

Groups: Guests
Posts: 3,022

Was thanked: 2 time(s) in 2 post(s)


I
tested on your image, and did not define my own zones. Instead, I used the
default zones that the engine finds for itself. The result was that the 3 words
"Last Action Date" were automatically grouped into one zone, and when I displayed the list of recognized words, they were listed in this exact order.


Can you post or send us the actual full image you're trying to OCR instead of the partial screen capture.

 
#6 Posted : Thursday, April 17, 2008 8:29:27 AM(UTC)

bsuresh  
bsuresh

Groups: Registered
Posts: 32


Qasem,

Thanks for the response. Here I am attaching the original page file (TIF).

Thanks,
Suresh
File Attachment(s):
Note0001.tif (36kb) downloaded 30 time(s).
 
#7 Posted : Saturday, April 19, 2008 10:28:35 AM(UTC)

bsuresh  
bsuresh

Groups: Registered
Posts: 32


Hi Qasem,

Any update on this please.
 
#8 Posted : Sunday, April 20, 2008 6:16:46 AM(UTC)

Qasem Lubani  
Guest

Groups: Guests
Posts: 3,022

Was thanked: 2 time(s) in 2 post(s)

I have tested here
with the full file and I got the same results you did. You can try to achieve
this by checking the recognized words X coordinates and comparing them with
each other, if they are close then the worlds belong to the same column and you can arrange them accordingly.
 
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.

Powered by YAF.NET | YAF.NET © 2003-2024, Yet Another Forum.NET
This page was generated in 0.092 seconds.