#1 Posted : Tuesday, September 18, 2018 2:27:37 PM(UTC)
Josh Clark

Posts: 46

While you can usually check how accurately a form is recognized by checking the confidence value, when simply creating a document with OCR, you may not be able to tell how accurate the recognition is until after the document is produced. Thankfully, the StatisticsInformationCommand allows us to check the image for necessary preprocessing. This demo I have included checks to see how good the contrast is by calculating how much of the image's histogram is within a predefined midtone range. If the midtones are 50% or more of the total image, it executes the StretchIntensityCommand and the ChangeContrastCommand to effect a much greater contrast for recognition. A sample snippet from the project is as follows:

// Set the threshold for how much gray to include in white/black regions
int threshold = 5;
// Run the stats command to determine minimum and maximum values
StatisticsInformationCommand statsCommand = new StatisticsInformationCommand();
// Use these values to select the "gray" region between the two endpoints
statsCommand.Start = statsCommand.Minimum + threshold;
statsCommand.End = statsCommand.Maximum - threshold;
// statsCommand.Percent should now return how much "gray" is in an image (not in the black or white regions)
Console.WriteLine("This image has {0:0.00}% of gray in it", statsCommand.Percent);

// If gray region is over 50%, perform image clean up
if (statsCommand.Percent > 50)
    Console.WriteLine("Performing image cleanup...");
    StretchIntensityCommand stretchIntensityCommand = new StretchIntensityCommand();
    ChangeContrastCommand changeContrastCommand = new ChangeContrastCommand();

    // Stretch intensity to make the darkest color black and lightest color white
    // Then increase the contrast to decrease the midtones
    changeContrastCommand.Contrast = 1000;
    Console.WriteLine("Done cleaup.");
    Console.WriteLine("Cleanup is not needed for this document");

A sample document, created specifically for this project can be found here. Note that this document will not properly recognize without adjustment.

The full source of this project can be downloaded here:
File Attachment(s):
OCR image cleanup demo.zip (4kb) downloaded 156 time(s).

Edited by user Wednesday, September 19, 2018 10:50:37 AM(UTC)  | Reason: typo

Josh Clark
Developer Support Engineer
LEAD Technologies, Inc.


