LEAD Technologies, Inc

Forms Recognition and Processing

The LEADTOOLS Forms Recognition and Processing engine provides developers with a comprehensive set of advanced tools to create form recognition and processing applications with minimal coding. It is fast, accurate and reliable.

Overview
Forms Recognition
Forms Processing
Speed Processing
Low Level Forms Recognition
Low Level Forms Processing

Overview . In general, forms recognition and processing systems perform the following steps:

The LEADTOOLS AutoFormsEngine is an optimized and fast implementation of the recognition and processing system. It automatically, creates form attributes, compares it with the Master Forms in the repository, and processes the form fields. LEADTOOLS AutoForms also gives you the option to run in multi-threaded mode to speed up the recognition and processing: taking advantage of multi-core technology. The Leadtools.Forms.Auto namespace provides a rich set of classes, interfaces, and methods that will reduce the implementation time in actions such as:

While most ECM (Enterprise Content Management) systems may take advantage of both recognition and processing, each process, recognition or processing, has a very specific task in a typical workflow.

Forms Recognition . Forms Recognition is the process of automatically identifying the name, type, and ID of any unknown form without human intervention. As long as a master form exists for the form being recognized, the LEADTOOLS Recognition Engine can quickly and accurately distinguish it from an unlimited number of predefined master forms. The engine uses an extremely accurate algorithm to extract the unique features (attributes) of each master form (single or multipage) and stores them in an XML file. This file is portable and efficient so you no longer need to store all of the original images for your master forms, thus freeing up unnecessary disk space. Once you have created master forms for all of the forms you expect to process, you will be able to fully automate the recognition process for all forms no matter which source (archival, scanner, etc) or resolution is used, whether it is deformed, or computer-generated, etc.

Our industry-leading recognition engine allows programmers to fine-tune the engine for the types of forms you expect to process. There are many factors which can be considered when creating each master form's attributes such as text, barcodes, and unique objects in the form. LEADTOOLS has created unique sub-engines ("Object Managers" as referred to by the SDK), to handle all of these different factors. These Object Managers allow you to choose the factors which should be considered when creating master form attributes. You can use a single Object Manager, or a group of them. Each manager has a unique purpose, hence choosing the appropriate manager will increase the performance and accuracy of the forms recognition. For example, if all forms you expect to recognize have unique barcodes, you would most likely need to just use the Barcode Manager. You could use other managers as well, but the Barcode Manager would be all that is necessary so the processing time spent adding other engines would be unnecessary. In addition to automatically creating form attributes through the different Object Managers, the engine has an optional feature which allows you to highlight important information in the form, such as the company or form name. No matter which object manager is used, the engine provides you with comprehensive results of the recognition, including a confidence level for each form. The Forms Recognition Engine provides the following "Object Managers":

OCR Manager (requires a LEADTOOLS OCR Engine) - The OCR Manager uses OCR to extract the text features from a form to create the form's attributes. The OCR manager can be used with any OCR Engine LEADTOOLS provides such as the Plus and Professional Engines. The OCR Manager is the optimal manager and is capable of recognizing forms which were scanned under several different conditions from the master form (resolution, alignment, etc). It uses an internal algorithm capable of calculating the amount of scale and shift in the unidentified form to provide a complete automatic alignment solution.

Barcode Manager (requires a LEADTOOLS Barcode Engine) - The Barcode Manager uses Barcode recognition technology to extract the barcode features from a form to create the form's attributes. This manager is capable of accurately recognizing forms in fractions of a second, even larger size images. The Barcode Manager can be used with any Barcode Engine LEADTOOLS provides, such as the 1D and 2D (DataMatrix, PDF417, QR) add-on modules. The Barcode Manager uses the image resolution to calculate the alignment so it is ideal for recognizing forms which can have different resolutions, but similar scales and shifts. Since most forms already contain some type of unique barcode, the Barcode Manager is a perfect fit for most scenarios.

Default Manager (No add-on required) - The Default Manager extracts special object features such as lines and inverted text from a form to create the form's attributes. This manager is useful for simple forms which have unique lines and other objects. While accurate, the Barcode and OCR Managers should be used for optimal performance and accuracy. The Default Manager uses image resolution to calculate the alignment. Consequently it is ideal for recognizing forms which are generated at different resolutions, but similar scales and shifts.

Forms Recognition basically works by creating a FormRecognitionAttributes object for each Master Form and form you would like to recognize, and then compares attributes to see which Master Form matches each form with the highest confidence. The following is an outline of the general steps involved in performing Form Recognition on one or more pages.

  1. Create the Master Forms Repository that points to the storage location of the Master Forms.

    Code

                string root = @"C:\Forms\FormsDemo\OCR_Test";
                RasterCodecs codecs = new RasterCodecs();
                DiskMasterFormsRepository repository = new DiskMasterFormsRepository(codecs, root);
                
    
  2. Create the OCR and Barcode engines to be used in the Auto-Forms Engine.

    Code

                List<IOcrEngine> ocrEngines = new List<OcrEngine>();
                IOcrEngine ocrEngine;
                //to use four threads
                int numberOfThreads = 4;
                for(int i = 0; i < numberOfThreads; i++)
                {
                   ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Professional, true);
                   ocrEngine.Startup(null, null, null, null);
                   ocrEngines.Add(ocrEngine);
                }
                BarcodeEngine barcodeEngine = new BarcodeEngine();
                
    
  3. Create the Auto-Forms Engine using the AutoFormsEngine Class.

    Code

                AutoFormsEngine autoEngine = new AutoFormsEngine(repository,ocrEngines,barcodeEngine,30,80, true);
                
    
  4. Call AutoEngine.Run to recognize and process the form at once, or call AutoEngine.RecognizeForm to recognize only the form.

    Code

                AutoFormsRunResult result = autoEngine.Run(image, null, null, null);
                
    
For a detailed outline to only recognize a form, see Steps To Recognize and Process a Form
For a detailed outline to generate a Master Form, see Steps To Generate Master Form and save it to master's repository

The Leadtools.Forms.Auto namespace provides a set of classes and interfaces for automated forms recognition and processing with multithread processing. Those who want to implement their own multi-thread process can disable multi-threading in Auto Forms or use the Low Level Forms design. The framework handles Form Categories using Repositories. LEADTOOLS provides sample implementations for disk-based form repositories. Users can inherit from the framework's interfaces ( IMasterForm , IMasterFormsCategory , IMasterFormsRepository) and implement their own custom repository as well.

Forms Processing. Forms Processing is the process of extracting the filled-in data information from predefined fields in a form. Fields are defined per page, so fields for a several page form can easily be created and data extracted from the desired field/page. Each field has the following attributes associated with it:

Field information can be processed regardless of image resolution, scale, and other form generation characteristics. No matter which field type is being used, the engine provides you with comprehensive results of the processing, including a confidence value for each result. The Forms Processing Engine provides the following field types:

In addition to the above predefined Field Types, the Processing Engine allows you to create your own custom fields for any unique needs you may have.

The following is an outline of the general steps involved in performing Forms Processing on one or more pages.

  1. Create the Master Forms Repository that points to the storage location of the Master Forms.

    Code

                string root = @"C:\Forms\FormsDemo\OCR_Test";
                RasterCodecs codecs = new RasterCodecs();
                DiskMasterFormsRepository repository = new DiskMasterFormsRepository(codecs, root);
                
    
  2. Create the OCR and Barcode engines to be used in Auto-Forms Engine.

    Code

                List<IOcrEngine> ocrEngines = new List<OcrEngine>();
                IOcrEngine ocrEngine;
                //to use four threads
                int numberOfThreads = 4;
                for(int i = 0; i < numberOfThreads; i++)
                {
                   ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Professional, true);
                   ocrEngine.Startup(null, null, null, null);
                   ocrEngines.Add(ocrEngine);
                }
                BarcodeEngine barcodeEngine = new BarcodeEngine();
                
    
  3. Create the AutoEngine using the AutoFormsEngine Class.

    Code

                AutoFormsEngine autoEngine = new AutoFormsEngine(repository,ocrEngines,barcodeEngine,30,80, true);
                
    
  4. Call AutoEngine.Run to recognize and process the form at once or call AutoEngine.ProcessForm to process only the form.

    Code

                AutoFormsRunResult result = autoEngine.Run(image, null, null, null);
                
    
For a detailed outline to recognize only a form, see Steps To Recognize and Process a Form
For a detailed outline to generate a Master Form, see Steps To Generate Master Form and save it to master's repository

How to speed up Forms Recognition and Processing
  1. Use the multithread case of AutoFormsEngine.
  2. If you are performing both recognition and processing, then initialize the AutoFormsEngine only with the OCR engines, and use the OCR Professional engine.
  3. If you are performing recognition without processing and all your Master Forms have different barcodes, then use only the Barcode engine to generate the Masters attributes and to initialize the AutoForms Engine.

Low Level Forms Recognition. Low Level Forms Recognition makes it possible to design custom algorithms for recognition and forms comparisons. The following is an outline of the general steps involved in performing Form Recognition on one or more pages.

  1. Create and initialize the FormRecognitionEngine using the Forms Recognition Engine Class.
  2. Create and add the desired Object Managers using the RecognitionObjectsManager Class.
  3. Create the Master Form (or several) attributes using the CreateMasterForm Method.
  4. Add pages to the Master Form using the AddMasterFormPage Method.
  5. Close the Master Form using the CloseMasterForm Method.
  6. Create form attributes for the forms you would like to recognize using the CreateForm Method.
  7. Add Pages to the form to be recognized using the AddFormPage Method.
  8. Close the form to be recognized using the CloseForm Method.
  9. Compare the attributes for the form to be recognized to the attributes of each master form using the CompareForm Method.

Master Form attributes can be loaded and saved to disk using the GetData and SetData Methods. In most cases, save all master form attributes to disk and when recognizing filled forms, load each master form attributes file and compare it with the attributes of the form being recognizing to see which returns the highest confidence value. For a simple tutorial using Forms Recognition, see Recognizing Forms.

Low Level Forms Processing. Low Level Forms Processing makes it possible to customize alignment and processing. The following is an outline of the general steps involved in performing Form Processing on one or more pages.

  1. Create and initialize the Forms Processing Engine using the FormProcessingEngine Class.
  2. Add the desired fields for each Master Form using the TextFormField, OMRFormField, BarcodeFormField, ImageFormField, or a custom user-defined field.
  3. Create a form page for each field collection using the FormPage.AddRange Method.
  4. Add each field page to the processing engine using the FormProcessingEngine.Pages.Add Method.
  5. Process the fields using the Process Method. This method requires the alignment for the given image. If Forms Recognition has not been performed and you which form is being recognized, use the GetFormAlignment or GetPageAlignment Method. If recognition has been performed call the Alignment property.

Fields can be loaded and saved to disk using the LoadFields and SaveFields Method. In most cases, save all of the fields for each master form to disk. Then, when processing filled forms, load the appropriate form fields from file for use in the FormProcessingEngine. For a simple tutorial using Forms Processing, please see Processing Forms.

SDK Definitions
Attributes
Unique features of a Master Form used to identify filled forms in the forms recognition process.
Barcode Manager
An Object Manager which created attributes based on barcode fields in the master form.
Confidence
Value from 0 to 100 representing how confident the results are. A value of "100" means full confidence while a value of "0" means no confidence.
Default Manager
An Object Manager which created attributes based on unique objects such as lines and inverted text in the master form.
Exclude region
An area which has no features or attributes necessary for form recognition.
Field
A predefined area on a recognized form from which you need to extract text, barcode, checkbox, image, or custom data.
Filled Data
Any data a user created on a form in a predefined field. Using the LEADTOOLS Forms Processing Engine, this data can be extracted from a recognized form.
Filled Form
A Master Form containing filled data. The Forms Recognition and processing Engine is used to uniquely identify the forms and extract the data from its fields.
Form
A filled form which needs to be recognized and/or processed.
Form Alignment
Information necessary in aligning a complete recognized form with the corresponding master form.
Form Category
A collection or logical grouping of similar Master Forms in a Form Repository. A Form Category can contain Master Forms and/or sub categories.
Forms Processing
The process of extracting user filled data from predefined fields in a recognized form.
Forms Recognition
The process of identifying a filled form with that of a Master Form.
Forms Repository
A storage system for Form Categories. This is the top-level of the collection.
Include region
An area which has features, or its features or attributes necessary for form recognition.
Master Form / Template
An unfilled or blank form containing unique attributes to that form. Master Forms can be single or multipage. Master Forms attributes are generated by the different Object Managers.
Object Manager
Unique sub-engines which generate attributes for a specific master form.
Ocr Manager
An Object Manager which created attributes based on text fields in the master form.
Page Alignment
Information necessary in aligning a recognized form page with the corresponding page from the master form.
Region of interest
An area which has very important attributes necessary for form recognition. These regions are used to highlight important features such as the company or form name.

Top ^

 

 


Products | Support | Contact Us | Copyright Notices

© 2006-2012 All Rights Reserved. LEAD Technologies, Inc.