Feature Description

LEADTOOLS provides fast and highly accurate Optical Character Recognition SDK technology for .NET (C# & VB), C/C++, WinRT, iOS, macOS, Java, and web. Leverage the high-level LEADTOOLS OCR toolkit to rapidly develop robust, scalable, and high-performance recognition and document processing applications that extract text from scanned documents and convert images to text-searchable formats such as PDF, PDF/A, DOC, DOCX, XML, and XPS.

The advanced OCR SDK technology in LEADTOOLS is multi-faceted and can be used as a standalone feature as well as the driving force behind more advanced technologies such as forms recognition, check recognition and document conversion. On its own, programmers can write as few as three lines of code to convert an image to text-searchable documents.

With extensive support for more than forty character sets, programmers can expand their customer base by providing the same solution for many languages, including English, Spanish, French, German, Japanese, Chinese, and Arabic.

Overview of LEADTOOLS OCR SDK Technology

Automatic Zone Recognition
Automatically segments an image into various zones which can improve recognition accuracy and efficiency

Manual Zone Recognition
Allow the user to draw and recognize text from specific regions of interest

Zone Types

  • Paragraph
  • Text
  • Numeric
  • Table
  • MICR
  • Graphic

Speed and Reliability

  • Fast, accurate, and reliable optical character recognition for use in any application and environment
    • Large volume document batch processing
    • Single and multi-page documents
    • Tablet and mobile devices
    • Web and Cloud applications
  • Utilize multiple cores for unparalleled performance

Accuracy

  • Spell checking dictionary support
  • Automatically detect, segment, and recognize multiple languages on the same document
  • Full-page analysis and Zonal recognition
    • Automatic table area segmentation
    • Automatic OMR area segmentation
    • Automatic vertical text segmentation with its orientation angle (i.e. 90, 270 or 0)
    • Automatic segmentation of vertical text within horizontal text pages
  • Automatic document cleanup
    • Omni-directional noise removal
    • Undither text
    • Dot matrix correction
    • Option to remove lines from tables
  • Automatic document preprocessing
    • Deskew of scanned document
    • Detect and correct the orientation of the document (flipped or reversed) with multi or single-page modes
    • Remove borders
    • Split pages
  • Unique color and bitonal image recognition for scanned documents and pictures can detect text regardless of foreground/background colors
  • Output searchable text document formats such as PDF, PDF/A, DOC, DOCX, XML, XPS, and more, maintaining the original look and feel
    • Text with detected font characteristics (font-family name, style, size, bold, italic, underline, strikeout, slope angle, etc.)
    • Location
    • Tables reconstruction
    • Layout
    • Graphics

Versatility

  • Fully featured SDK
    • High-level classes for one-shot recognition from image to final document
    • Low-level functionality for full customization
  • Supports multiple text recognition engines
  • Fully configurable recognition engine
    • Narrow down possible results with character and numeric filters
    • Multiple voting techniques for enhanced accuracy
    • Trace and progress callbacks
    • Enable/disable font characteristics for fast text only recognition
  • Comprehensive results reporting
    • Character location, size and baseline
    • Character attributes (end of word, end of line, end of paragraph, etc.)
    • Font properties (monospace, proportional, serif, sans-serif, bold, italic, underline, strikethrough)
    • Confidence values
    • Obtain the recognized words directly for a zone or page without saving to an external document
  • Integrates seamlessly with LEADTOOLS Forms Recognition and Processing
  • Implement large-scale distributed OCR applications using LEADTOOLS Distributed Computing SDK

Cross-Platform

  • Native, full-featured Optical Character Recognition libraries for many platforms
    • Windows desktops, servers, and Phones
    • iOS and macOS
    • Android
    • Web Services
  • Native mobile libraries run on the device without need to call external services
  • Utilize camera preview on phones and tablets for real-time text recognition and processing
  • Specialized image processing and recognition for deformations unique to mobile devices
    • 3D Perspective Deskew
    • Keystone Correction
    • Fixed-focus camera optimizations
  • Programming interfaces closely resemble each other, making it easy to port an application to multiple platforms

LEADTOOLS OCR Engines

The LEADTOOLS OCR SDK contains the following recognition engines:

Advantage OCR Engine

Developed in-house with native libraries for Windows x86/x64, .NET (C# & VB), WinRT, Linux, iOS, macOS, and Android.

Benefits of In-House Development

  • New features and enhancements to speed and accuracy are continuously added
  • Customer feedback and feature requests are used to enhance and expand the engine with solutions to real-world applications

Professional OCR Engine

Licenses the Nuance Omnipage 18 recognition engine and includes native libraries for Windows x86/x64 and .NET.

Interchangeable High-level Interface

LEADTOOLS features a high-level interface that abstracts and simplifies the use of multiple recognition engines with only a single line of code. For example, the following code will determine which recognition engine is used while the rest of your code remains unchanged.


// Use Advantage
IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false);
// Use Professional
IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Professional, false);
      

TIFF to Searchable PDF in Three Lines of Code

The LEADTOOLS OCR SDK provides a high-level programming interface that allows developers to create complex recognition applications in record time. For example, with the AutoRecognizeManager, developers can convert any of 150+ image formats into a text-searchable format such as PDF or DOC in as little as three lines of code!


IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false);
ocrEngine.Startup(null, null, null, null);
ocrEngine.AutoRecognizeManager.Run(@"C:\InputFile.tif", @"C:\OutputFile.pdf",
   DocumentFormat.Pdf, null, null);
      

OCR Languages

Recognize text from more than 40 languages and character sets, including:

  • English (en)
  • Afrikaans (af)
  • Albanian (sq)
  • Arabic (ar)
  • Azerbaijani (az)
  • Basque (eu)
  • Belarusian (be)
  • Bulgarian (bg)
  • Catalan (ca)
  • Chinese Simplified (zh-Hans)
  • Chinese Traditional (zh-Hant)
  • Croatian (hr)
  • Czech (cs)
  • Danish (da)
  • Dutch (nl)
  • Estonian (et)
  • Faroese (fo)
  • Finnish (fi)
  • French (fr)
  • Galician (gl)
  • German (de)
  • Greek (el)
  • Hungarian (hu)
  • Icelandic (is)
  • Indonesian (id)
  • Italian (it)
  • Japanese (ja)
  • Korean (ko)
  • Latvian (lv)
  • Lithuanian (lt)
  • Macedonian (mk)
  • Malay (ms)
  • Maltese (mt)
  • Norwegian (no)
  • Polish (pl)
  • Portuguese (pt)
  • Portuguese Brazil (pt-BR)
  • Romanian (ro)
  • Russian (ru)
  • Serbian (sr)
  • Serbian Cyrillic (sr-Cyrl-CS)
  • Slovak (sk)
  • Slovenian (sl)
  • Spanish (es)
  • Swahili (sw)
  • Swedish (sv)
  • Telugu (te)
  • Thai (th)
  • Turkish (tr)
  • Ukrainian (uk)
  • Vietnamese (vi)

Technology Related to OCR