Welcome Guest! To enable all features, please Login or Register.



Last Go to last post Unread Go to first unread post
#1 Posted : Thursday, March 19, 2009 8:31:54 AM(UTC)

Groups: Registered
Posts: 15


I figured out how to delete unwanted chars from zoneCharacters collection after doing OCR on an IOcrDocument. However, now that I can remove zone characters I have two new issues:

1)I get run-on words where my program deleted invalid zone characters that was at the end of line.

OCR Output Text File w/zone characters removed

I now get in the text file:

2)Removing the zone characters does not appear to remove the extra lines. This is more of an issue if I need to save the result as non-ASCII text format.

OCR Output Text File w/zone characters removed
// <--- leaves blank lines where zone chars were removed
// " " "


Here is the code fragment:
void IterateOcrResults()
foreach (IOcrPage ocrPage in _document.Pages)
IOcrPageCharacters pageCharacters = ocrPage.GetRecognizedCharacters();
List delZoneChars = new List();

foreach (IOcrZoneCharacters zoneCharacters in pageCharacters)
ICollection recogWords = zoneCharacters.GetWords(ocrPage.DpiX, ocrPage.DpiY, LogicalUnit.Pixel);

foreach (OcrWord word in recogWords)
if (word is bad)
for (int i = word.FirstCharacterIndex; i <= word.LastCharacterIndex; i++)
OcrCharacter zoneCharacter = zoneCharacters[i];
// remove invalid zone chars
foreach (OcrCharacter ocrChar in delZoneChars)


Do you have any suggestions how to resolve these issues?

Thank you!



Try the latest version of LEADTOOLS for free for 60 days by downloading the evaluation: https://www.leadtools.com/downloads

Wanna join the discussion? Login to your LEADTOOLS Support accountor Register a new forum account.

#2 Posted : Friday, March 20, 2009 11:06:34 AM(UTC)


Groups: Registered, Tech Support, Administrators
Posts: 764

The problem is likely due to the fact that you are actually deleting the OcrCharacter rather than just modifying it.  The OcrCharacter structure has a Position property that flags whether this is the end of a line, end of a paragraph, etc.  You can delete the character, but you need to make sure that you are modifying previous characters to maintain the position property properly. 

I would suggest simply changing the character code to a space since it achieves nearly the same result and is much simpler to code.  However, if that's not an option for you, you'll need to implement some way to keep track of the most recent valid character so that when you come upon a character you want to delete that has a Position property of something other than None you can go back and make changes if necessary.
#3 Posted : Friday, June 12, 2009 9:08:01 AM(UTC)
Kevin LEAD

Groups: Registered
Posts: 3

Please refer to the latest OcrEditDemo in LEADTOOLS 16.5. It has a functionality in it that does exactly that.
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.

Powered by YAF.NET | YAF.NET © 2003-2024, Yet Another Forum.NET
This page was generated in 0.166 seconds.