Feature Description

LEADTOOLS provides fast and highly accurate Optical Character Recognition SDK technology for .NET (C# & VB), C/C++, WinRT, iOS, OS X, Java and web. Leverage LEADTOOLS' high-level OCR toolkit to rapidly develop robust, scalable and high-performance recognition and document processing applications that extract text from scanned documents and convert images to text-searchable formats such as PDF, PDF/A, DOC, DOCX, XML and XPS.

The advanced OCR SDK technology in LEADTOOLS is multi-faceted and can be used as a standalone feature as well as the driving force behind more advanced technologies such as forms recognition, check recognition and document conversion. On its own, programmers can write as few as three lines of code to convert an image to text-searchable documents.

With LEADTOOLS' extensive support for more than forty character sets, programmers can expand their customer base by providing the same solution for many languages, including English, Spanish, French, German, Japanese, Chinese and Arabic.

Overview of LEADTOOLS OCR SDK Technology

Automatic Zone Recognition
Automatically segments an image into various zones which can improve recognition accuracy and efficiency

Manual Zone Recognition
Allow the user to draw and recognize text from specific regions of interest

Zone Types

  • Paragraph
  • Text
  • Numeric
  • Table
  • MICR
  • Graphic

LEADTOOLS OCR Engines

The LEADTOOLS OCR SDK contains the following recognition engines:

Advantage OCR Engine

Developed in-house with native libraries for Windows x86/x64, .NET (C# & VB), WinRT, Linux, iOS, OS X and Android.

Benefits of In-House Development

  • New features and enhancements to speed and accuracy are continuously added
  • Customer feedback and feature requests are used to enhance and expand the engine with solutions to real-world applications

Professional OCR Engine

Licenses the Nuance Omnipage 18 recognition engine and includes native libraries for Windows x86/x64 and .NET.

Interchangeable High-level Interface

LEADTOOLS features a high-level interface that abstracts and simplifies the use of multiple recognition engines with only a single line of code. For example, the following code will determine which recognition engine is used while the rest of your code remains unchanged.


// Use Advantage
IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false);
// Use Professional
IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Professional, false);
            

TIFF to Searchable PDF in Three Lines of Code

The LEADTOOLS OCR SDK provides a high-level programming interface that allows developers to create complex recognition applications in record time. For example, with the AutoRecognizeManager, developers can convert any of 150+ image formats into a text-searchable format such as PDF or DOC in as little as three lines of code!


IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false);
ocrEngine.Startup(null, null, null, null);
ocrEngine.AutoRecognizeManager.Run(@"C:\InputFile.tif", @"C:\OutputFile.pdf",
   DocumentFormat.Pdf, null, null);
            

OCR Languages

LEADTOOLS supports the recognition of more than 40 languages, including:

  • English (en)
  • Afrikaans (af)
  • Albanian (sq)
  • Arabic (ar)
  • Azerbaijani (az)
  • Basque (eu)
  • Belarusian (be)
  • Bulgarian (bg)
  • Catalan (ca)
  • Chinese Simplified (zh-Hans)
  • Chinese Traditional (zh-Hant)
  • Croatian (hr)
  • Czech (cs)
  • Danish (da)
  • Dutch (nl)
  • Estonian (et)
  • Faroese (fo)
  • Finnish (fi)
  • French (fr)
  • Galician (gl)
  • German (de)
  • Greek (el)
  • Hungarian (hu)
  • Icelandic (is)
  • Indonesian (id)
  • Italian (it)
  • Japanese (ja)
  • Korean (ko)
  • Latvian (lv)
  • Lithuanian (lt)
  • Macedonian (mk)
  • Malay (ms)
  • Maltese (mt)
  • Norwegian (no)
  • Polish (pl)
  • Portuguese (pt)
  • Portuguese Brazil (pt-BR)
  • Romanian (ro)
  • Russian (ru)
  • Serbian (sr)
  • Serbian Cyrillic (sr-Cyrl-CS)
  • Slovak (sk)
  • Slovenian (sl)
  • Spanish (es)
  • Swahili (sw)
  • Swedish (sv)
  • Telugu (te)
  • Thai (th)
  • Turkish (tr)
  • Ukrainian (uk)
  • Vietnamese (vi)

Technology Related to OCR