OCR Module, Image File Size

Options

View

Last

Unread

Previous Topic Next Topic

This topic and its replies were posted before the current version of LEADTOOLS was released and may no longer be applicable.

#1 Posted : Wednesday, March 10, 2010 9:42:16 PM(UTC)

Majid Chaudry

Groups: Registered
Posts: 3

Hi,

I am evaluating the OCR module and using .Net C#.

In my scenario i can come across files consisting of multiple image and having size of about 300-500 MB. I just want to know how application would behave i such scenario and what is its effect on the .Net environment and other system resources. Also if you could guide what is the best application design in such scenario.

Kind regards,

Majid Ali Chaudry


	Try the latest version of LEADTOOLS for free for 60 days by downloading the evaluation: https://www.leadtools.com/downloads Wanna join the discussion? Login to your LEADTOOLS Support account or Register a new forum account.

#2 Posted : Thursday, March 11, 2010 5:48:46 AM(UTC)

Adnan Ismail

Groups: Guests
Posts: 3,022

Was thanked: 2 time(s) in 2 post(s)

Majid,
There are limits on the size of each page, but not on the total file size. This means if your pages are normal size (such as Letter or A4 page size), and the file has hundreds or thousands of pages, you should be able to use our OCR module with such files.

The best approach would be to divide the processing into groups, each group consisting of few pages. You can then combine the OCR output into one big file if you want to.

#3 Posted : Thursday, March 11, 2010 6:26:37 PM(UTC)

Majid Chaudry

Groups: Registered
Posts: 3

Adnan,

Thank you for your prompt response. In the above suggested approach i am having some problems in coding i.e

1. While loading file pages to OCRDocument object i have no way of knowing how many pages are there in the document. If i load whole document with the -1 parameter in AddPages method that is going to be heavy on memory. kindly elaborate the process so that i could implement it accordingly.

2. I can defiantly process (Recognize) pages in batches but the resultant text remains with the OCRDocument object (at least what i understand). To get the text in a specific format (say PDF) i have to call the save method on OCRDocument object not on individual pages. So how am i going to get the text in batches and then combine it into a single document. There is a RecognizeText method on OCRPage object but that only returns text.

Best Regards,

Majid Ali Chaudry

#4 Posted : Sunday, March 14, 2010 8:02:39 AM(UTC)

Adnan Ismail

Groups: Guests
Posts: 3,022

Was thanked: 2 time(s) in 2 post(s)

Majid,

1- To know how many pages your file contains, use the CodecsImageInfo class and check the return value of the CodecsImageInfo.TotalPages property.
For more details about this class, check this page:
http://www.leadtools.com/help/leadtools/v16/DH/CO/Leadtools.Codecs~Leadtools.Codecs.CodecsImageInfo_members.html
After that, you can use one of the AddPages() method overloads that handles multi-page files.

2- About combining the results of several OCR operations into one output file, you can save the output result using the LEADTOOLS Temporary Document Format. After that, you can combine these results in one file format using the Leadtools.Forms.DocumentWriters.Convert Method.
For more details about the LEADTOOLS Temporary Document Format, see this page:
http://www.leadtools.com/help/leadtools/v16/dh/to/leadtools.topics~leadtools.topics.fileformatsdocltd.html

Also, the following post has an example that you can try:
http://support.leadtools.com/CS/forums/18746/ShowPost.aspx#29161

You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.