Programming with the LEADTOOLS Forms Recognition Processing Engine

The LEADTOOLS Forms Recognition and Processing engine provides developers with a comprehensive set of advanced tools to create form recognition and processing applications with minimal coding. It is fast, accurate, and reliable.

Overview

In general, forms recognition and processing systems perform the following steps:

LEAD offers two levels of functions, as follows:

High-Level AutoFormsEngine

The LEADTOOLS AutoFormsEngine is an optimized implementation of the recognition and processing system. Built upon the Low-Level engine, it automatically creates form attributes, compares it with the Master Forms in the repository, and processes the form fields. To speed up recognition and processing, the LEADTOOLS AutoFormsEngine can also take advantage of multi-core processors when set to use its multi-threaded mode. The Leadtools.Forms.Auto namespace provides a rich set of classes, interfaces, and methods that will reduce the implementation time of actions such as:

While most ECM (Enterprise Content Management) systems may take advantage of both recognition and processing, each recognition or processing step has a very specific task in a typical workflow.

High-Level Forms Recognition

Forms Recognition is the process of automatically identifying the name, type, and ID of any unknown form without human intervention. As long as a master form exists for the form being recognized, the LEADTOOLS Recognition Engine can quickly and accurately distinguish it from an unlimited number of pre-defined master forms. The engine uses an extremely accurate algorithm to extract the unique features (attributes) of each master form (single or multipage) and store them in an XML file. This file is portable and efficient: since it is no longer necessary to store all of the original images for your master forms, the disk space can be reclaimed. After creating master forms for all of the forms you expect to process, the recognition process can be automated for all forms. No longer will it matter which source (archival, scanner, etc.) or resolution is used; nor will it matter whether the form is deformed, computer-generated, etc.

The LEADTOOLS Recognition Engine is the industry-leading recognition engine. With it, programmers can fine-tune the engine for the specific types of forms to be processed. There are many factors to consider when creating a master form's attributes, including the text, barcodes, and unique objects in the form.

LEADTOOLS has created unique sub-engines ("Object Managers" as referred to by the SDK), to handle all of these different factors. Object Managers allow you to choose the factors which should be considered when creating a master form's attributes. You can use a single Object Manager, or a group of them. Each manager has a unique purpose: choosing the appropriate manager will increase the performance and accuracy of forms recognition. For example, if all forms are expected to have unique barcodes, most likely only the Barcode Manager would be needed. Other managers as well, but the Barcode Manager would be the only one necessary (potentially saving the processing time spent using other engines).

In addition to automatically creating form attributes through the different Object Managers, the engine can highlight important information in the form, such as the company or form name. No matter which object manager is used, the engine provides you with comprehensive results about the recognition, including a confidence level for each form. The Forms Recognition Engine provides the following "Object Managers":

Forms Recognition basically works by creating FormRecognitionAttributes objects for each Master Form and form you would like to recognize. Then comparisons to the Master Forms in a repository are run. Master Form selection is based on the confidence values returned from the comparisons. If the confidence value for a particular comparison is above the minimum confidence value specified, the search is stopped to save time in unnecessary form comparisons. The following general steps outline how to perform Form Recognition on one or more pages.

  1. Create the Master Forms Repository that points to the storage location of the Master Forms.
    SetLicense(); 
                     
    // Set the name of the folder that contains the Master Forms 
    string root = @"C:\Users\Public\Documents\LEADTOOLS Images\Forms\MasterForm Sets\OCR\"; 
                     
    RasterCodecs codecs = new RasterCodecs(); 
    DiskMasterFormsRepository repository = new DiskMasterFormsRepository(codecs, root); 
    1. Create the OCR and Barcode engines to be used in the Auto-Forms Engine.
      // Create the OCR engine instance, and use the LEADTOOLS OCR Module - LEAD Engine 
      IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD, false); 
                       
      // Start up the OCR engine 
      ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 20\Bin\Common\OcrLEADRuntime"); 
                       
      // Create the Main BarcodeEngine instance 
      BarcodeEngine barcodeEngine = new BarcodeEngine(); 
    2. Create the Auto-Forms Engine using the AutoFormsEngine Class.
      // Create the Main AutoFormsEngine instance 
      AutoFormsEngine autoEngine = new AutoFormsEngine(repository, ocrEngine, barcodeEngine, 30, 80, true); 
                       
      // Set up the recognition options 
      autoEngine.RecognizeFirstPageOnly = true; 
      autoEngine.MinimumConfidenceKnownForm = 40; 
                       
      // Get a list of the files to process 
      string[] files = Directory.GetFiles(@"C:\Users\Public\Documents\LEADTOOLS Images\Forms\Forms to be Recognized\OCR\", "*.tif"); 
                       
      // Call the function that contains the main recognition process 
      ProcessFiles(autoEngine, files); 
  2. Call AutoEngine.Run to both recognize and process the form at once, or call AutoEngine.RecognizeForm to recognize only the form. The following code shows how to handle the AutoFormsEngine class in a multi-threading application:

    private static void ProcessFiles(AutoFormsEngine autoEngine, string[] files) 
    { 
       Console.WriteLine("Started Processing Files ..."); 
       // Get the number of files to process 
       int fileCount = files.Length; 
                        
       // Event to notify us when all work is finished 
       using (AutoResetEvent finishedEvent = new AutoResetEvent(false)) 
       { 
          // Loop through all Files in the given Folder 
          foreach (string file in files) 
          { 
             // Capture the file name here, since we are using an anonymous function 
             string fileToProcess = file; 
                              
             // Process it in a thread 
             ThreadPool.QueueUserWorkItem((state) => 
             { 
                try 
                { 
                   // Show the name 
                   string name = Path.GetFileName(fileToProcess); 
                   Console.WriteLine("Processing {0}", name); 
                                    
                   // Process it 
                   AutoFormsRunResult result = autoEngine.Run(fileToProcess, null); 
                                    
                   // Check results 
                   if (result.FormFields != null && result.RecognitionResult.MasterForm != null) 
                      Console.WriteLine(string.Format("  Master Form Found \"{0}\" for {1}", result.RecognitionResult.MasterForm.Name, name)); 
                   else 
                      Console.WriteLine(string.Format("  No Master Form Found for {0}", name)); 
                } 
                catch(Exception ex) 
                { 
                   Console.WriteLine("Error {0}", ex.Message); 
                } 
                finally 
                { 
                   if (Interlocked.Decrement(ref fileCount) == 0) 
                   { 
                      // We are done, inform the main thread 
                      finishedEvent.Set(); 
                   } 
                } 
             }); 
          } 
                           
          // Wait till all operations are finished 
          finishedEvent.WaitOne(); 
          Console.WriteLine("Finished Processing Files"); 
       } 
    } 

High-Level Forms Processing

Forms Processing is the process of extracting the filled-in data information from pre-defined fields in a form. Fields are defined per page, so fields for a form comprised of several pages can easily be created and its data extracted from the desired field/page. Each field has the following attributes associated with it:

Field information can be processed regardless of image resolution, scale, or other form generation characteristics. No matter which field type is being used, the engine provides comprehensive results of the processing, including a confidence value for each result. The Forms Processing Engine provides the following field types:

In addition to the above pre-defined Field Types, the Processing Engine allows you to create your own custom fields for any unique needs you may have.

How to Speed up Forms Recognition and Processing

1. Use the multi-threaded case of AutoFormsEngine.
2. When performing both recognition and processing, initialize the AutoFormsEngine only with the OCR engines, and use the LEADTOOLS OCR Module - LEAD Engine.
3. When performing recognition without processing and all Master Forms have different barcodes, then use only the Barcode engine to generate the Master attributes and to initialize the AutoFormsEngine.

Low-Level Forms Recognition

Low-Level Forms Recognition makes it possible to design custom algorithms for recognition and forms comparisons. Whereas the High-Level Forms Recognition searches for the first form that meets or exceeds the Minimum Confidence value, the Low-Level Forms Recognition performs the search over all master forms in the repository to find the form with the highest confidence value. Note that this requires all the master forms in the repository to be loaded.

The general steps involved in performing Form Recognition on one or more pages are shown in the following outline.

1. Create and initialize the FormRecognitionEngine using the Forms Recognition Engine Class.
2. Create and add the desired Object Managers using the RecognitionObjectsManager Class.
3. Create the Master Form (or several) attributes using the CreateMasterForm Method.
4. Add pages to the Master Form using the AddMasterFormPage Method. 5. Close the Master Form using the CloseMasterForm Method.
6. Create form attributes for the forms you would like to recognize using the CreateForm Method.
7. Add Pages to the form to be recognized using the AddFormPage Method.
8. Close the form to be recognized using the CloseForm Method.
9. Compare the attributes for the form to be recognized to the attributes of each master form using the CompareForm Method.

Master Form attributes can be loaded and saved to disk using the GetData and SetData methods. In most cases, you should save all master form attributes to disk. Then, when recognizing filled forms, load each master form attributes file and compare it with the attributes of the form being recognized to see which returns the highest confidence value. Note that the search has to be performed over all master forms in the repository. For a simple tutorial using Forms Recognition, refer to Recognizing Forms.

Low-Level Forms Processing

Low-Level Forms Processing makes it possible to customize alignment and processing. The general steps involved in performing Form Processing on one or more pages is shown in the following outline.

1. Create and initialize the Forms Processing Engine using the FormProcessingEngine Class.
2. Add the desired fields for each Master Form using the TextFormField, OMRFormField, BarcodeFormField, ImageFormField, or a custom user-defined field.
3. Create a form page for each field collection using the FormPage.AddRange Method.
4. Add each field page to the processing engine using the FormProcessingEngine.Pages.Add Method.
5. Process the fields using the Process Method. This method requires the alignment for the given image. If Forms Recognition has not been performed and you know which form is being recognized, call the GetFormAlignment or GetPageAlignment method. If recognition has been performed use the Alignment property.


Fields can be loaded and saved to disk using the LoadFields and SaveFields methods. In most cases, you should save all of the fields for each master form to disk. Then, when processing filled forms, load the appropriate form fields from file for use in the FormProcessingEngine. For a simple tutorial using Forms Processing, refer to Processing Forms.

SDK Definitions

Attributes
Unique features of a Master Form used to identify filled forms in the forms recognition process.

Barcode Manager
An Object Manager which created attributes based on barcode fields in the master form.

Confidence
Value from 0 to 100 representing the amount of confidence in the results. A value of "100" means full confidence while a value of "0" means no confidence.

Default Manager
An Object Manager which created attributes based on unique objects such as lines and inverted text in the master form.

Exclude region
An area which has no features or attributes necessary for form recognition.

Field
A pre-defined area on a recognized form from which you need to extract text, barcode, check box, image, or custom data.

Filled Data
Any data a user added to a form in a pre-defined field. Using the LEADTOOLS Forms Processing Engine, this data can be extracted from a recognized form.

Filled Form
A Master Form containing filled data. The Forms Recognition and Processing Engine is used to uniquely identify the forms and extract the data from its fields.

Form
A filled form which needs to be recognized and/or processed.

Form Alignment
Information necessary to align a recognized form with its corresponding master form.

Form Category
A collection or logical grouping of similar Master Forms in a Form Repository. A Form Category can contain Master Forms and/or sub categories.

Forms Processing
The process of extracting user-filled data from pre-defined fields in a recognized form.

Forms Recognition
The process of identifying a filled form with that of a Master Form.

Forms Repository
A storage system for Form Categories. This is the top-level of the collection.

Include region
An area which has features or attributes necessary for form recognition.

Master Form / Template
An unfilled or blank form containing unique attributes to that form. Master Forms can be single or multipage. Master Forms attributes are generated by the different Object Managers.

Object Manager
Unique sub-engines which generate attributes for a specific master form.

Ocr Manager
An Object Manager which created attributes based on text fields in the master form.

Page Alignment
Information necessary to align a recognized form page with the corresponding page from the master form.

Region of interest
An area which has very important attributes necessary for form recognition. These regions are used to highlight important features such as the company or form name.

Top ^

Help Version 20.0.2020.4.3
Products | Support | Contact Us | Intellectual Property Notices
© 1991-2020 LEAD Technologies, Inc. All Rights Reserved.

LEADTOOLS Imaging, Medical, and Document