Introduction
Leadtools.Ocr Introduction
Summary
The LEADTOOLS OCR Class Library provides programming tools for quickly and easily inserting document optical character recognition (OCR) technology into software applications. Using the LEADTOOLS OCR Class Library, programmers can perform character recognition on document images and output recognized text to over 20 file formats.
LEADTOOLS makes OCR development easier with auto-zone detection, manual zone creation, auto-orientation, document image cleanup, and preset values for common document images that can be used to improve recognition results. The LEADTOOLS OCR Class Library supports over 40 languages and features output document options like specifying document margins and paragraph options.
- Adobe Portable Document Format (PDF and PDF/A)
- Microsoft Word (DOCX and DOC)
- Hypertext Markup Language (HTML)
- Text (ASCII and UNICODE)
- Microsoft Rich Text Format (RTF)
- Scalable Vector Graphics (SVG)
- Windows Enhanced Metafile (EMF)
- LEAD Temporary Document (LTD)
- Open XML Paper Specification (XPS)
- Microsoft Excel (XLS)
- ALTO XML
- Intelligent Character Recognition (ICR)
OCR Engines
LEADTOOLS OCR offers support for the following OCR engine:
- LEADTOOLS OCR Module - LEAD Engine
With the LEADTOOLS OCR Class Library, the internal workings of the various engines are hidden and represented in a uniform class library. You should be able to, if desired, switch among any of the supported OCR engines without changing your application code or logic.
Standard OCR Engine Options
The following standard options are available in all OCR engines:
- OMR (Optical Mark Recognition)
- Auto/manual Zoning
- Formatted Output
- PDF Output
Key Features
- Add support for multi-threading and server-based OCR operations
- Create multiple OCR documents in your application. Each document contains its own list of pages
- Select the language to use when recognizing the OCR pages
- Use dictionaries to improve OCR results (LEAD engine only)
- Recognize a variety of documents, including facsimiles, photocopies, and documents having complex layouts
- Save the document in any of several output document formats including PDF, MS Word, as well as regular text
- Correct noise, darkness, and lightness to achieve the best possible character recognition
- Use artificial intelligence to improve recognition on documents of the same type.
- Segment complex pages (manually or automatically) into text, image, and table recognition zones.
-
Use powerful zone recognition tools which can perform the following tasks:
- Recognize an entire page as one zone
- Manually specify and recognize multiple zones within each page
- Perform automatic area segmentation when creating multi-layered zones and recognizing areas such as tables, rulers, images, and text
- Specify multiple, specialized options for each zone; including OMR, MRZ, and MICR zones
- Display document pages with or without their zones
- Import zones from and export zones to files
- Recognize text and colors within tables
Additional Features:
- Recognize text from 5 to 72 points in any typeface
- Recognize multiple languages within one document
- Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats
- Process documents in two-page mode for open-faced books and magazines
Supported Environments
See Also
Reference
Getting Started (Guide to Example Programs)
Programming with LEADTOOLS .NET OCR
Creating an OCR Engine Instance
Starting and Shutting Down the OCR Engine
OCR Spell Language Dictionaries
Working with OCR Languages
Working with OCR Pages
Working with OCR Zones
Recognizing OCR Pages
OCR Confidence Reporting
OCR Engine-specific Settings
OCR/ICR Tutorials
OCR Languages and Spell-Checking
Using OMR in LEADTOOLS .NET OCR
Multi-Threading with LEADTOOLS OCR
Version History
Leadtools.Ocr Assembly Changes