Combine Forms Recognition with the Document Analyzer - C# .NET 6

This tutorial shows how to create, recognize, and process forms with structured and unstructured fields with Master Forms using the LEADTOOLS Document Analyzer in a C# .NET 6 application.

Overview  
Summary This tutorial covers how to recognize and process both fields and rulesets of a form using the Document Analyzer's AutoFormsEngine in a C# .NET 6 Console application.
Completion Time 20 minutes
Visual Studio Project Download tutorial project (1 MB)
Platform C# .NET 6 Console Application
IDE Visual Studio 2022
Runtime Target .NET 6 or higher
Development License Download LEADTOOLS

Required Knowledge

Get familiar with the basic steps of creating a project by reviewing the Add References and Set a License tutorial, before working on the Combine Forms Recognition with the Document Analyzer - C# .NET 6 tutorial.

Definitions

Structured and Unstructured Fields

Forms come in a variety of shapes and sizes, each with varying amounts of fields filled with information. The types of information stored in these fields can be quite similar or differ drastically depending on the type of form that is used. The LEADTOOLS SDK provides a variety of classes and interfaces for automated processing of forms, allowing for quick and efficient detection of both structured and unstructured fields.

Structured Fields

The fields in the form have static locations, similar data types, and can be defined in a ruleset. Every structured form of the same type should contain the same fields for information, with matching types and structure. Some examples are tax forms for US citizens, such as 1040EZ or W-2 forms, where fields are expected to appear at predefined locations and contain similar data type.

Unstructured Fields

The fields in the form do not have predefined characteristics, such as type or location. A ruleset defined for another structured or unstructured form will likely not be able to perfectly detect all of the fields on an unstructured form, so one must be created for that specific form to define its fields. An example of this could be a newly created form that is specialized for an activity or some unique reporting process.

Create the Project and Add LEADTOOLS References

Start with a copy of the project created in the Add References and Set a License tutorial. If the project is not available, follow the steps in that tutorial to create it.

The references needed depend upon the purpose of the project. References can be added via NuGet packages.

This tutorial requires the following NuGet package:

Alternatively, if NuGet packages are not used, the following DLLs are required:

For a complete list of which DLL files are required for your application, refer to Files to be Included With Your Application.

Set the License File

The License unlocks the features needed for the project. It must be set before any toolkit function is called. For details, including tutorials for different platforms, refer to Setting a Runtime License.

There are two types of runtime licenses:

Initialize RasterCodecs, IOcrEngine, and AutoFormsEngine

With the project created, the references added, and the license set, coding can begin.

In the Solution Explorer, open Program.cs. Add the following statements to the using block at the top of Program.cs.

C#
using Leadtools; 
using Leadtools.Codecs; 
using Leadtools.Document; 
using Leadtools.Document.Analytics; 
using Leadtools.Document.Data; 
using Leadtools.Document.Unstructured; 
using Leadtools.Forms.Auto; 
using Leadtools.Forms.Processing; 
using Leadtools.Forms.Recognition; 
using Leadtools.Ocr; 
using Newtonsoft.Json; 

Add the below global variables to the Program class.

C#
private static AutoFormsEngine autoEngine; 
private static RasterCodecs codecs; 
private static IOcrEngine ocrEngine; 
private static DiskMasterFormsRepository masterFormsRepository; 
private static string masterFormSetDirectory; 
private static string filledFormDirectory; 

Set the values of the masterFormSetDirectory, and filledFormDirectory to point to your desired directories containing the Master Form sets and Filled Forms that are to be recognized, as seen below. For the purpose of this tutorial, we have available for download a set of Master Forms and Filled Forms.

Then, add a new method to the Program class named InitFormsEngines()and call it inside Main() below the set license call.

C#
static void Main(string[] args) 
{ 
   try 
   { 
      string projectRoot = Directory.GetParent(Environment.CurrentDirectory).Parent.Parent.FullName; 
      masterFormSetDirectory = Path.Combine(projectRoot, "MasterForm Sets"); 
      filledFormDirectory = Path.Combine(projectRoot, "FilledForms"); 
 
      // Startup 
      InitLEAD(); 
      InitFormsEngines(); 
 
      // Cleanup 
      autoEngine.Dispose(); 
      if (ocrEngine != null && ocrEngine.IsStarted) 
         ocrEngine.Shutdown(); 
   } 
   catch (Exception ex) 
   { 
      Console.WriteLine(ex.Message); 
   } 
} 

Note The above code assumes that the Master Form and Filled Form resources provided above are located at the base of the project directory. Ensure that these directories are changed to the correct paths containing the Master Forms and Filled Forms to be used.

Add the code below to the InitFormsEngines() method to initialize the FormRecognitionEngine, FormProcessingEngine, RasterCodecs, and IOcrEngine objects.

C#
private static void InitFormsEngines() 
{ 
   codecs = new RasterCodecs(); 
 
   ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD); 
   ocrEngine.Startup(codecs, null, null, @"C:\LEADTOOLS23\Bin\Common\OcrLEADRuntime"); 
 
   masterFormsRepository = new DiskMasterFormsRepository(codecs, masterFormSetDirectory); 
   autoEngine = new AutoFormsEngine(masterFormsRepository, ocrEngine, null, AutoFormsRecognitionManager.Default | AutoFormsRecognitionManager.Ocr, 30, 80, true); 
} 

Add the Form Recognition Code

In the Program class, add two new methods called RecognizeForm(string formToRecognize) and ShowProcessedResults(AutoFormsRunResult runResult). These new methods will be called inside the Main() method, below the InitFormsEngines() method.

Add the code below to the RecognizeForm() method to recognize the forms located within the directory represented by the filledFormDirectory variable.

C#
private static AutoFormsRunResult RecognizeForm(string formToRecognize) 
{ 
   string resultMessage = "Form not recognized"; 
 
   Console.WriteLine("Attempting to classify {0}...", Path.GetFileName(formToRecognize)); 
   AutoFormsRunResult runResult = autoEngine.Run(formToRecognize, null); 
   if (runResult != null) 
   { 
      FormRecognitionResult recognitionResult = runResult.RecognitionResult.Result; 
      resultMessage = $"This form has been recognized as a {runResult.RecognitionResult.MasterForm.Name} with {recognitionResult.Confidence}% confidence.\n"; 
   } 
 
   Console.WriteLine(resultMessage); 
 
   ShowProcessedResults(runResult); 
   return runResult; 
} 

Add the code below to the ShowProcessedResults(AutoFormsRunResult runResult) method to show the recognition results:

C#
private static void ShowProcessedResults(AutoFormsRunResult runResult) 
{ 
   if (runResult == null) 
      return; 
 
   string resultsMessage = ""; 
   try 
   { 
      foreach (FormPage formPage in runResult.FormFields) 
         foreach (FormField field in formPage) 
            if (field != null) 
               resultsMessage = $"{resultsMessage}{field.Name} = {(field.Result as TextFormFieldResult).Text}\n"; 
   } 
   catch (Exception ex) 
   { 
      Console.WriteLine(ex.Message); 
   } 
 
   Console.WriteLine("Field Processing Results:"); 
   if (string.IsNullOrEmpty(resultsMessage)) 
      Console.WriteLine("No fields were processed"); 
   else 
      Console.WriteLine(resultsMessage); 
} 

Note

The method ShowProcessedResults(AutoFormsRunResult runResult) is only necessary if the user wishes to see the structured fields within a form. If only the results of a form's rulesets are desired, as covered below, this method and its respective calls can be removed.

Add the calls to RecognizeForm and ShowProcessedResults methods to the Main() method below the call to InitFormsEngines() to recognize the desired forms and display the results. Main() should now look like this:

C#
static void Main(string[] args) 
{ 
   try 
   { 
      string projectRoot = Directory.GetParent(Environment.CurrentDirectory).Parent.Parent.FullName; 
      masterFormSetDirectory = Path.Combine(projectRoot, "MasterForm Sets"); 
      filledFormDirectory = Path.Combine(projectRoot, "FilledForms"); 
 
      // Startup 
      InitLEAD(); 
      InitFormsEngines(); 
 
      // Recognize forms 
      DirectoryInfo filledFormDir = new DirectoryInfo(filledFormDirectory); 
      FileInfo[] forms = filledFormDir.GetFiles(); 
      Console.WriteLine("# of Forms Detected: {0}\n", forms.Length); 
 
      foreach (FileInfo form in forms) 
      { 
         string currFormName = form.FullName; 
         AutoFormsRunResult runResult = RecognizeForm(currFormName); 
 
         Console.WriteLine("========================================================================="); 
      } 
 
      // Cleanup 
      autoEngine.Dispose(); 
      if (ocrEngine != null && ocrEngine.IsStarted) 
         ocrEngine.Shutdown(); 
   } 
   catch (Exception ex) 
   { 
      Console.WriteLine(ex.Message); 
   } 
} 

Run Rulesets Using the Document Analyzer

In the Program class, create two new methods called GetRulesetDirectory(string masterFormName) and RunRuleset(). These new methods will be called from Main() and will process any rulesets for the corresponding Master Form that are contained within the DiskMasterFormsRepository (Includes all subfolder).

Add the following code to the new GetRulesetDirectory(string masterFormName) method to simplify the path of the folder containing a form's ruleset:

C#
private static string GetRulesetDirectory(string masterFormName) 
{ 
   return Path.Combine(masterFormSetDirectory, masterFormName, "Rulesets"); 
} 

Note

This method assumes that your Master Form directory organizational structure follows the same format as the Master Form sets and Filled Form resources provided above. Adjust this method as necessary to match the structure of your given Master Form repository.

Add the following code to the new RunRuleset(string formToRecognize, string ruleset) method to use the Document Analyzer to run each ruleset for the corresponding form type and display the results:

C#
private static void RunRuleset(string formToRecognize, string ruleset) 
{ 
   LEADDocument document = DocumentFactory.LoadFromFile(formToRecognize, new LoadDocumentOptions()); 
   document.Text.OcrEngine = ocrEngine; 
 
   // Create Analyzer  
   DocumentAnalyzer analyzer = new DocumentAnalyzer() 
   { 
      Reader = new UnstructuredDataReader(), 
      QueryContext = new FileRepositoryContext(ruleset) 
   }; 
 
   DocumentAnalyzerRunOptions options = new DocumentAnalyzerRunOptions { ElementQuery = new RepositoryQuery() }; 
 
   List<ElementSetResult> results = analyzer.Run(document, options); 
 
   Console.WriteLine("Ruleset Results:"); 
   foreach (ElementSetResult result in results) 
      foreach (ElementResult item in result.Items) 
         Console.WriteLine($"{(item.GetFriendlyName())} = {(item.Value)}"); 
} 

Add the calls to the Main() method in order to process all of the existing rulesets for each recognized form type. Main() should look like this:

C#
static void Main(string[] args) 
{ 
   try 
   { 
      string projectRoot = Directory.GetParent(Environment.CurrentDirectory).Parent.Parent.FullName; 
      masterFormSetDirectory = Path.Combine(projectRoot, "MasterForm Sets"); 
      filledFormDirectory = Path.Combine(projectRoot, "FilledForms"); 
 
      // Startup 
      InitLEAD(); 
      InitFormsEngines(); 
 
      // Recognize forms 
      DirectoryInfo filledFormDir = new DirectoryInfo(filledFormDirectory); 
      FileInfo[] forms = filledFormDir.GetFiles(); 
      Console.WriteLine("# of Forms Detected: {0}\n", forms.Length); 
 
      foreach (FileInfo form in forms) 
      { 
         string currFormName = form.FullName; 
         AutoFormsRunResult runResult = RecognizeForm(currFormName); 
 
         // Process rulesets for that form 
         DirectoryInfo rulesetDir = new DirectoryInfo(GetRulesetDirectory(runResult.RecognitionResult.MasterForm.Name)); 
         FileInfo[] rulesets = rulesetDir.GetFiles("*.json"); 
         foreach (FileInfo ruleset in rulesets) 
         { 
            Console.WriteLine("Running Ruleset {0}...", ruleset.Name); 
            RunRuleset(currFormName, ruleset.FullName); 
         } 
         Console.WriteLine("========================================================================="); 
      } 
 
      // Cleanup 
      autoEngine.Dispose(); 
      if (ocrEngine != null && ocrEngine.IsStarted) 
         ocrEngine.Shutdown(); 
   } 
   catch (Exception ex) 
   { 
      Console.WriteLine(ex.Message); 
   } 
} 

Run the Project

Run the project by pressing F5, or by selecting Debug -> Start Debugging.

If the steps were followed correctly, the console appears and the application displays the recognized form along with the processed structured and unstructured fields. For this example, a 1040EZ, W4, and W9 have been included. For each, their structured and unstructured results will be displayed in the output console:

Recognition and Processing results displayed to the console.

Wrap-up

This tutorial showed how to recognize a form using the AutoFormsEngine class, process the form's structured and unstructured fields, and display the results to the console.

See Also

Help Version 23.0.2024.4.23
Products | Support | Contact Us | Intellectual Property Notices
© 1991-2024 LEAD Technologies, Inc. All Rights Reserved.


Products | Support | Contact Us | Intellectual Property Notices
© 1991-2023 LEAD Technologies, Inc. All Rights Reserved.