Welcome Guest! To enable all features, please Login or Register.

Notification

Icon
Error

Options
View
Last Go to last post Unread Go to first unread post
#1 Posted : Thursday, March 19, 2009 8:31:54 AM(UTC)
warren022

Groups: Registered
Posts: 15


v16/VS2008/C#

I figured out how to delete unwanted chars from zoneCharacters collection after doing OCR on an IOcrDocument. However, now that I can remove zone characters I have two new issues:

1)I get run-on words where my program deleted invalid zone characters that was at the end of line.

OCR Output Text File w/zone characters removed
FOR
GENERATOR

I now get in the text file:
FORGENERATOR


2)Removing the zone characters does not appear to remove the extra lines. This is more of an issue if I need to save the result as non-ASCII text format.

OCR Output Text File w/zone characters removed
2400V
BUS
// <--- leaves blank lines where zone chars were removed
// " " "


---
480V STATION
SERVICE BUS NO 4 YIB

Here is the code fragment:
void IterateOcrResults()
{
foreach (IOcrPage ocrPage in _document.Pages)
{
IOcrPageCharacters pageCharacters = ocrPage.GetRecognizedCharacters();
List delZoneChars = new List();


foreach (IOcrZoneCharacters zoneCharacters in pageCharacters)
{
ICollection recogWords = zoneCharacters.GetWords(ocrPage.DpiX, ocrPage.DpiY, LogicalUnit.Pixel);

foreach (OcrWord word in recogWords)
{
if (word is bad)
{
for (int i = word.FirstCharacterIndex; i <= word.LastCharacterIndex; i++)
{
OcrCharacter zoneCharacter = zoneCharacters[i];
delZoneChars.Add(zoneCharacter);
}
}
// remove invalid zone chars
foreach (OcrCharacter ocrChar in delZoneChars)
{
zoneCharacters.Remove(ocrChar);

}
ocrPage.SetRecognizedCharacters(pageCharacters);
}
}



Do you have any suggestions how to resolve these issues?

Thank you!

Warren
[:)]



 

Try the latest version of LEADTOOLS for free for 60 days by downloading the evaluation: https://www.leadtools.com/downloads

Wanna join the discussion? Login to your LEADTOOLS Support accountor Register a new forum account.

#2 Posted : Friday, March 20, 2009 11:06:34 AM(UTC)

GregR  
GregR

Groups: Registered, Tech Support, Administrators
Posts: 764


The problem is likely due to the fact that you are actually deleting the OcrCharacter rather than just modifying it.  The OcrCharacter structure has a Position property that flags whether this is the end of a line, end of a paragraph, etc.  You can delete the character, but you need to make sure that you are modifying previous characters to maintain the position property properly. 

I would suggest simply changing the character code to a space since it achieves nearly the same result and is much simpler to code.  However, if that's not an option for you, you'll need to implement some way to keep track of the most recent valid character so that when you come upon a character you want to delete that has a Position property of something other than None you can go back and make changes if necessary.
 
#3 Posted : Friday, June 12, 2009 9:08:01 AM(UTC)
Kevin LEAD

Groups: Registered
Posts: 3


Please refer to the latest OcrEditDemo in LEADTOOLS 16.5. It has a functionality in it that does exactly that.
 
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.

Powered by YAF.NET | YAF.NET © 2003-2024, Yet Another Forum.NET
This page was generated in 0.104 seconds.