Technology / Document

LEADTOOLS OCR SDK

OCR

LEADTOOLS provides state of the art Optical Character Recognition SDK technology that is fast and highly accurate for C/C++, C#, VB.NET, Java and Web developers. Leverage LEADTOOLS' high level OCR toolkit to rapidly develop robust, scalable and high-performance recognition and document processing applications which extract text from scanned documents, convert images to text-searchable formats such as PDF, PDF/A, DOC, DOCX, XML, XPS and more.

With LEADTOOLS' extensive support for over forty character sets, programmers can expand their customer base by providing the same solution for many languages including English, Spanish, French, German, Japanese, Chinese, Arabic and more.

OCR screenshot

Overview of LEADTOOLS OCR SDK Technology

  • Fast, accurate and reliable optical character recognition for use in any application or environment
    • Large volume document batch processing
    • Single and multi-page documents
    • Web and Cloud applications
  • Comprehensive multi-thread support for maximum performance
  • Fully featured SDK
    • High-level classes for one-shot recognition from image to final document
    • Low level functionality for full customization
  • Supports multiple text recognition engines
    • OCR for machine printed text
    • ICR for hand written text
    • MICR for check processing
    • MRZ & MRP for passport numbers
  • Recognize text from over 40 languages and character sets including English, Spanish, French, German, Russian, Japanese, Chinese, Arabic* and more
  • Spell checking and dictionary support
  • Automatically detect, segment and recognize multiple languages on the same document
  • Full page analysis and Zonal recognition
  • Unique color and bitonal image recognition for scanned documents and pictures
  • Automatic document cleanup
    • Omni-directional noise removal
    • Undither text
    • Dot matrix correction
    • Option to remove lines from tables
  • Automatic document pre-processing
    • Deskew of scanned documented
    • Detect and correct the orientation of the document (flipped or reversed) with full document or page by page modes
  • Fully configurable recognition engine
    • Narrow down possible results with character and numeric filters
    • Multiple voting techniques for enhanced accuracy
    • Trace and progress callbacks
    • Enable/Disable font characteristics for fast text only recognition
  • Comprehensive reporting for text results
    • Character location, size and baseline
    • Character attributes (end of word, end of line, end of paragraph, etc.)
    • Font properties (monospace, proportional, serf, sans-serif, bold, italic, underline, strikethrough)
    • Confidence values
    • Obtain the recognized words directly for a zone or page without saving to an external document
  • Output searchable text document formats such as PDF, PDF/A, DOC, DOCX, XML, XPS and more, maintaining the original look and feel
    • Text with detected font characteristics (font-family name, style, size, bold, italic, underline and strikeout etc.)
    • Location
    • Tables reconstruction
    • Layout
    • Graphics
  • Integrates seamlessly with LEADTOOLS Forms Recognition and Processing
  • Implement large scale distributed OCR applications using LEADTOOLS Distributed Computing SDK

Automatic Zone Recognition
Automatically segments an image into various zones which can improve recognition accuracy and efficiency

Manual Zone Recognition
Allow the user to draw and recognize text from specific regions of interest

Zone Types

  • Paragraph
  • Text
  • Numeric
  • Table
  • MICR
  • Graphic

LEADTOOLS OCR Engines

The LEADTOOLS OCR SDK contains the following recognition engines:

Advantage OCR Engine

Developed in-house with native libraries for Windows x86/x64, .NET, WinRT, iOS, OS X and Android.

Benefits of In-House Development

  • New features and enhancements to speed and accuracy are continuously added
  • Customer feedback and feature requests are used to enhance and expand the engine with solutions to real world applications

Professional OCR Engine

Licenses the Nuance Omnipage 18 recognition engine and includes native libraries for Windows x86/x64 and .NET.

Interchangeable High Level Interface

LEADTOOLS features a high level interface that abstracts and simplifies the use of multiple recognition engines with only a single line of code. For example, the following code will determine which recognition engine is used while the rest of your code remains unchanged.

// Use Advantage
IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false);
// Use Professional
IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Professional, false);
                            

TIFF to Searchable PDF in Three Lines of Code

The LEADTOOLS OCR SDK provides a high level programming interface which allows developers to create complex recognition applications in record time. For example, with the AutoRecognizeManager, developers can convert any of 150+ image formats into a text-searchable format such as PDF or DOC in as little as three lines of code!

TIFF to PDF in 3 Lines of Code

OCR Languages

LEADTOOLS supports the recognition of over forty languages including:

  • English (en)
  • Afrikaans (af)
  • Albanian (sq)
  • Arabic (ar)
  • Basque (eu)
  • Belarusian (be)
  • Bulgarian (bg)
  • Catalan (ca)
  • Chinese Simplified (zh-Hans)
  • Chinese Traditional (zh-Hant)
  • Croatian (hr)
  • Czech (cs)
  • Danish (da)
  • Dutch (nl)
  • Estonian (et)
  • Faroese (fo)
  • Finnish (fi)
  • French (fr)
  • Galician (gl)
  • German (de)
  • Greek (el)
  • Hungarian (hu)
  • Icelandic (is)
  • Japanese (ja)
  • Korean (ko)
  • Indonesian (id)
  • Italian (it)
  • Latvian (lv)
  • Lithuanian (lt)
  • Macedonian (mk)
  • Norwegian (no)
  • Polish (pl)
  • Portuguese (pt)
  • Portuguese Brazil (pt-BR),
  • Romanian (ro)
  • Russian (ru)
  • Serbian (sr)
  • Serbian Cyrillic (sr-Cyrl-CS)
  • Slovak (sk)
  • Slovenian (sl)
  • Spanish (es)
  • Swedish (sv)
  • Turkish (tr)
  • Ukrainian (uk)
  • Vietnamese (vi)

Platforms and Programming Interfaces

LEADTOOLS SDK Products that Include OCR SDK Technology

Hover over each product for a description. Click for more details.

LEADTOOLS Recognition Imaging SDK

The LEADTOOLS Recognition Imaging SDK is a handpicked collection of LEADTOOLS SDK features designed to build end-to-end document imaging applications as part of an enterprise level document automation solution that requires scanning, OCR, OMR, forms recognition and processing, archival, annotation and display functionality. This powerful set of tools utilizes LEAD's industry LEADing image processing technology to intelligently identify document features that can be used to recognize any type of scanned or faxed form image.

LEADTOOLS Document Imaging Suite SDK

The LEADTOOLS Document Imaging Suite SDK is a comprehensive collection of LEADTOOLS SDK features designed to build end-to-end document imaging applications within enterprise level document automation solutions that requires capture, OCR, OMR, forms recognition and processing, PDF, print capture, archival, annotation and display functionality. This powerful set of tools utilizes LEAD's industry LEADing image processing technology to intelligently identify document features that can be used to recognize any type of scanned or faxed form image.

LEADTOOLS OCR Module - Advantage

The LEADTOOLS Advantage OCR Module adds methods for incorporating optical character recognition (OCR), intelligent character recognition (ICR) and magnetic ink character recognition (MICR) technology into applications and includes everything needed to develop robust, high performance and scalable image recognition solutions. The LEADTOOLS Advantage OCR Module seamlessly integrates with LEADTOOLS SDKs in the Document and Medical product lines.

LEADTOOLS OCR Module - Professional (* Only product to include Arabic language support)

The LEADTOOLS Professional OCR Module adds methods for incorporating optical character Recognition (OCR) technology into applications and includes everything needed to develop robust, high performance and scalable image Recognition solutions. The LEADTOOLS Professional OCR Module seamlessly integrates with LEADTOOLS SDKs in the Document and Medical product lines.

LEADTOOLS Arabic OCR Module - Professional

The LEADTOOLS Professional Arabic OCR Module adds methods for incorporating optical character recognition (OCR) technology into applications and includes everything needed to develop robust, high performance and scalable image recognition solutions. The LEADTOOLS Arabic OCR Module seamlessly integrates with LEADTOOLS SDKs in the Document and Medical lines.

LEADTOOLS Asian OCR Module - Professional

The LEADTOOLS Professional Asian OCR Module adds methods for incorporating optical character recognition (OCR) technology into applications and includes everything needed to develop robust, high performance and scalable image recognition solutions. The LEADTOOLS Asian OCR Module seamlessly integrates with LEADTOOLS SDKs in the Document and Medical lines. Supported languages include Chinese, Japanese and Korean.

LEADTOOLS ICR Module - Professional

The LEADTOOLS Professional ICR Module adds methods for incorporating intelligent character recognition (ICR) and optical character recognition (OCR) technology into applications and includes everything needed to develop robust, high performance and scalable image recognition solutions. The LEADTOOLS Professional ICR Module includes the LEADTOOLS Professional OCR Module and seamlessly integrates with LEADTOOLS SDKs in the Document and Medical product lines.

LEAD Technologies Logo
LEADTOOLS Logo