Enhanced OCR Noise Removal Coming Soon

While making my rounds through the engineering department, the OCR team showed me some really impressive enhancements to the Advantage OCR engine coming soon. They’ve accomplished a lot, but my personal favorite is what they’ve done to the Advantage OCR engine’s preprocessing algorithm. With much sweat, tears and coffee, they’ve fine-tuned the noise removal algorithm with impressive results. Other engines may have difficulty seeing between the lines (literally) when forms and documents use separator bars or boxes for individual characters. LEADTOOLS Advantage OCR Engine is doing a superb job at intelligently removing the noise and returning only the text of interest, rather than getting hung up on bars, dashes, speckles and other types of noise that should simply be ignored.

Before (TIF) and After (Searchable PDF) Screenshot


Other than the obvious benefit of improved accuracy, this is especially helpful for customers using forms recognition where character separators are prevalent. On documents where it might have been necessary to use a separate zone for each character and piece them together post-recognition, now only a single zone is needed since the separator bars and cells will no longer be taken into account.

You can expect to see these improvements and more in the coming weeks as we release some free updates to Version 18. Stay tuned!

About 

Developer Advocate

    Find more about me on:
  • linkedin
  • twitter
  • youtube
This entry was posted in OCR and tagged , , , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *