Document Analyzer High-Level Usage - Console C#

This tutorial shows how to create a C# Windows Console application to perform basic operations using the LEADTOOLS Document Analyzer SDK.

Overview  
Summary This tutorial shows how to use and perform basic DocumentAnalyzer operations.
Completion Time 30 minutes
Visual Studio Project Download tutorial project (7 KB)
Platform C# Windows Console Application
IDE Visual Studio 2017, 2019
Development License Download LEADTOOLS

Required Knowledge

Before working on the Document Analyzer High-Level Usage - Console C# tutorial, get familiar with the basic steps of creating a project by reviewing the Add References and Set a License tutorial.

Create the Project and Add the LEADTOOLS References

Start with a copy of the project created in the Add References and Set a License tutorial. If you do not have that project, follow the steps in that tutorial to create it.

The references needed depend upon the purpose of the project. References can be added by one or the other of the following two methods (but not both). For this project, the following references are needed:

If using NuGet references, this tutorial requires the following NuGet package:

If using local DLL references, the following DLLs are needed.

The DLLs are located at <INSTALL_DIR>\LEADTOOLS21\Bin\Dotnet4\x64:

For a complete list of which DLL files are required for your application, refer to Files to be Included With Your Application.

Set the License File

The License unlocks the features needed for the project. It must be set before any toolkit function is called. For details, including tutorials for different platforms, refer to Setting a Runtime License.

There are two types of runtime licenses:

Note

Adding LEADTOOLS NuGet and local references and setting a license are covered in more detail in the Add References and Set a License tutorial.

Add the Document Analyzer Feature Classes

With the project created, the references added, and the license set, coding can begin.

In Solution Explorer, open Program.cs. Add the following statements to the using block at the top of Program.cs:

C#
// Using block at the top 
using `<PROJECT_NAME>`.Tutorials; 
using Leadtools; 
using System; 
using System.IO; 

Right-click on <PROJECT_NAME>.csproj and select Add -> New Folder. Name the folder Tutorials. This folder will contain six classes showcasing various features of high-level Document Analyzer API. To add a new class to the Tutorials folder, right-click the folder and select Add -> New Item. Select Class and name the class. Add the six classes in the table below.

Class Name Description
SaveLoad.cs Create sample features, save to JSON, and load JSON.
StandardFeatures.cs Create a standard date feature.
CustomFeatures.cs Create a custom sample feature.
ExcludedFeatures.cs Find emails with one exclusion.
ExecuteFeatures.cs Create an engine to execute features.
LabeledFeatures.cs Find emails and add a feature label.

Add the code below to the Main() method to run the various features highlighted in the newly created classes.

C#
static void Main(string[] args) 
{ 
    SetLicense(); 
 
    // Run the tutorial samples 
    SaveLoad.Run(); 
    StandardFeatures.Run(); 
    CustomFeatures.Run(); 
    ExcludedFeatures.Run(); 
    LabeledFeatures.Run(); 
    ExecuteFeatures.Run(); 
} 

Add the SaveLoad Class Code

In Solution Explorer, open SaveLoad.cs. Add the following statements to the using block at the top:

C#
using Leadtools.Document.Unstructured.Highlevel; 
using System.Collections.Generic; 

Create a new Run() method to the SaveLoad class. Add the code to the Run() method to execute the features in this class.

C#
public static void Run() 
{ 
    // Create sample features 
    var feature = SampleFeature(); 
 
    // Save to json 
    var json = feature.ToJson(); 
 
    // Load from json 
    var loaded = FeatureResourceBuilder.Build(json); 
} 

Add a new method named SampleFeature(), which will return each IFeature object called from the Run() method. IFeature is the base abstract class for features created to extract form information using automated unstructured forms processing.

Add the code below to the SampleFeature() class to create a custom sample feature.

C#
private static IFeature SampleFeature() 
{ 
    // Create a sample custom feature 
    var sample = new CustomFeature() 
    { 
        Name = "Sample", 
        Value = new List<InfoValue>() { new InfoValue() { Pattern = @"\d" } } 
    }; 
 
    return sample; 
} 

Add the StandardFeatures Class Code

In Solution Explorer, open StandardFeatures.cs. Add the following statements to the using block at the top:

C#
using Leadtools.Document.Unstructured.Highlevel; 
using System.Collections.Generic; 

Create a new Run() method to the StandardFeatures class. Add the code to the Run() method to execute the features in this class.

C#
public static void Run() 
{ 
    // Date 
    var std_feature = StandardDate(); 
 
    // All features 
    var std_all = AllStandardFeatures(); 
} 

Add two new methods named StandardDate() and AllStandardFeatures(). Both of these methods are called inside the Run() method, to return the IFeature(s) for data extraction.

Add the code below to the StandardDate() method to create a standard date feature.

C#
private static IFeature StandardDate() 
{ 
    // Standard date feature 
    var std_date = new StandardFeature() { ValueName = "Date", Name = "Tutorial_Date" }; 
 
    return std_date; 
} 

Add the code below to the StandardDate() method to create a list of features from all the regex expressions in the built-in database.

C#
private static IEnumerable<IFeature> AllStandardFeatures() 
{ 
    foreach (var value in RegexExpressionDb.List("value")) 
    { 
        var std = new StandardFeature() { ValueName = value, Name = value }; 
 
        yield return std; 
    } 
} 

Add the CustomFeatures Class Code

In Solution Explorer, open CustomFeatures.cs. Add the following statements to the using block at the top:

C#
using Leadtools.Document.Unstructured.Highlevel; 
using System.Collections.Generic; 

Create a new Run() method to the CustomFeatures class. Add the code to the Run() method to execute the features in this class.

C#
public static void Run() 
{ 
    // Custom feature to find (demo) banking account number 
    var custom_feature = Account(); 
} 

Add a new method named Account(). Add the code below to the Account() method to return the feature created to find the bank account number.

C#
public static IFeature Account() 
{ 
    var acc = new CustomFeature() { Name = "Account" }; 
 
    acc.Label = new List<InfoLabel>() 
    { 
        new InfoLabel() 
        { 
            Value = new InfoValue() 
            { 
                Pattern="account(\\s)?number", 
                Tweaks=new RegexTweaks() 
                { 
                    IgnoreCase=true, 
                    IgnoreWhiteSpace=false, 
                    FuzzyMatching=FuzzyMatching.Auto, 
                    IgnoreIfShorterThan=8, 
                    LettersToNumbers=false, 
                    MatchWholeWord=false, 
                }, 
                TweaksForResults=new RegexResultsTweaks() 
                { 
                    IncludeWholeWord=false, 
                    IncludeWholeLine=false, 
                } 
            }, 
            Where = ECLocation.Right, 
            LocationProximity=5, 
        }, 
        new InfoLabel() 
        { 
            Value = new InfoValue() 
            { 
                Pattern="loan(\\s)?number", 
                Tweaks=new RegexTweaks() 
                { 
                    IgnoreCase=true, 
                    IgnoreWhiteSpace=false, 
                    FuzzyMatching=FuzzyMatching.Auto, 
                    IgnoreIfShorterThan=8, 
                    LettersToNumbers=false, 
                    MatchWholeWord=false, 
                }, 
                TweaksForResults=new RegexResultsTweaks() 
                { 
                    IncludeWholeWord=false, 
                    IncludeWholeLine=false, 
                } 
            }, 
            Where = ECLocation.Right, 
            LocationProximity =5, 
        }, 
        new InfoLabel() 
        { 
            Value = new InfoValue() 
            { 
                Pattern="brokerage(\\s)?cash(\\s)?number", 
                Tweaks=new RegexTweaks() 
                { 
                    IgnoreCase=true, 
                    IgnoreWhiteSpace=false, 
                    FuzzyMatching=FuzzyMatching.Auto, 
                    IgnoreIfShorterThan=10, 
                    LettersToNumbers=false, 
                    MatchWholeWord=false, 
                }, 
                TweaksForResults=new RegexResultsTweaks() 
                { 
                    IncludeWholeWord=false, 
                    IncludeWholeLine=false, 
                } 
            }, 
            Where = ECLocation.Right, 
            LocationProximity =5, 
        }, 
        new InfoLabel() 
        { 
            Value = new InfoValue() 
            { 
                Pattern="Account(\\s)No.", 
                Tweaks=new RegexTweaks() 
                { 
                    IgnoreCase=false, 
                    IgnoreWhiteSpace=false, 
                    FuzzyMatching=FuzzyMatching.Auto, 
                    IgnoreIfShorterThan=8, 
                    LettersToNumbers=false, 
                    MatchWholeWord=false, 
                }, 
                TweaksForResults=new RegexResultsTweaks() 
                { 
                    IncludeWholeWord=false, 
                    IncludeWholeLine=false, 
                } 
            }, 
            Where = ECLocation.Right, 
            LocationProximity =5, 
        }, 
    }; 
 
    acc.Value = new List<InfoValue>() 
    { 
        new InfoValue() 
        { 
            Pattern = "\\d{3,4}(-)?\\d{3,14}", 
            Tweaks = new RegexTweaks() 
            { 
                FuzzyMatching=FuzzyMatching.Auto, 
                IgnoreCase=true, 
                IgnoreWhiteSpace=true, 
                IgnoreIfShorterThan=5, 
                LettersToNumbers=true, 
                MatchWholeWord=false, 
            }, 
            TweaksForResults = new RegexResultsTweaks() 
            { 
                IncludeWholeWord=true, 
                IncludeWholeLine=false 
            } 
        }, 
        new InfoValue() 
        { 
            Pattern = "\\d{3,4}(-|//s)?\\d{3,6}(-|//s)?\\d{3,6}", 
            Tweaks = new RegexTweaks() 
            { 
                FuzzyMatching=FuzzyMatching.Auto, 
                IgnoreCase=true, 
                IgnoreWhiteSpace=true, 
                IgnoreIfShorterThan=5, 
                LettersToNumbers=true, 
                MatchWholeWord=false, 
            }, 
            TweaksForResults = new RegexResultsTweaks() 
            { 
                IncludeWholeWord=true, 
                IncludeWholeLine=false 
            } 
        } 
    }; 
 
    return acc; 
} 

Add the ExcludedFeatures Class Code

In Solution Explorer, open ExcludedFeatures.cs. Add the following statements to the using block at the top:

C#
using Leadtools.Document.Unstructured.Highlevel; 
using System.Collections.Generic; 

Create a new Run() method to the ExcludedFeatures class. Add the code to the Run() method to execute the features in this class.

C#
public static void Run() 
{ 
    // Feature to find emails excluding "info@leadtools.com" 
    var features = new List<IFeature>() 
    { 
        // Emails matching 
        new StandardFeature(){ValueName="Email"}, 
 
        // Excluding the exact email below 
        ExcludeExact("info@leadtools.com") 
    }; 
    // Now we have a list of features, if executed, it will match all emails except for info@leadtools.com 
} 

Add a new method named ExcludeExact() to the ExcludedFeatures class. This method will be called in the Run() method above. Add the below code to the new method to add a feature that finds emails, excluding emails that are listed in the Run() method.

C#
private static IFeature ExcludeExact(string text) 
{ 
    var ex = new CustomFeature() { Name = "Excluded" }; 
    ex.Value = new List<InfoValue>() 
    { 
        new InfoValue() 
        { 
            Pattern = text, 
            PatternIsRegex = false, 
            Tweaks = new RegexTweaks(), 
            TweaksForResults = new RegexResultsTweaks() 
        } 
    }; 
 
    ex.Excluded = true; 
    return ex; 
} 

Add the ExecuteFeatures Class Code

In Solution Explorer, open ExecuteFeatures.cs. Add the following statements to the using block at the top:

C#
using System.Collections.Generic; 
using System.Threading; 
using Leadtools.Document; 
using Leadtools.Document.Unstructured.Highlevel; 

Create a new Run() method to the ExecuteFeatures class. This class is used to show how to run the FeaturesProcessingEngine to extract data from a loaded document based on the created features. Add the code to the Run() method to execute the features in this class.

C#
public async static void Run() 
{ 
    // Custom feature to find (demo) banking account number 
    var custom_feature = Account(); 
 
    // Load a target document 
    var doc_file_name = @"INSERT FILE PATH TO TARGET DOCUMENT"; 
    var Document = DocumentFactory.LoadFromFile(doc_file_name, new LoadDocumentOptions()); 
 
    // Create engine to run and execute features 
    var engine = new FeaturesProcessingEngine(true); 
    var results = await engine.Run(new List<IFeature>() { custom_feature }, Document, CancellationToken.None); 
} 

The Account() method used to test the sample document in the Run() method, is the same Account() method in the CustomFeature class, so use that code to add to the ExecuteFeatures class.

Add the LabeledFeatures Class Code

In Solution Explorer, open LabeledFeatures.cs. Add the following statements to the using block at the top:

C#
using Leadtools.Document.Unstructured.Highlevel; 
using System.Collections.Generic; 

Create a new Run() method to the LabeledFeatures class. Add the code to the Run() method to execute the features in this class.

C#
public static void Run() 
{ 
    // Feature for Emails matching 
    var feature = new StandardFeature() { ValueName = "Email" }; 
 
    // Add label  
    AddLabel(feature, "email:"); 
} 

Add a new method to the LabeledFeatures class named AddLabel(StandardFeature feature, string labelText). Add the code below to the new method to create a custom label for a custom or standard feature.

C#
private static void AddLabel(StandardFeature feature, string labelText) 
{ 
    feature.CustomLabel = true; 
    feature.CustomLabels = new List<InfoLabel>() 
    { 
        new InfoLabel() 
        { 
            Value = new InfoValue() 
            { 
                // Exact matching label text 
                Pattern = labelText, 
                PatternIsRegex = false, 
            }, 
 
            // Location 
            Where = ECLocation.Right, 
            // Proximity 
            LocationProximity = 5, 
        }, 
    }; 
} 

Run the Project

Run the project by pressing F5, or by selecting Debug -> Start Debugging.

If the steps were followed correctly, the console appears and the application will execute the code for each sample feature class. To test the ExecuteFeatures class code, ensure that you change the file path to the string value of your test document.

Wrap-up

This tutorial showed how to use the LEADTOOLS Document Analyzer to perform high-level API operations.

See Also

Help Version 21.0.2021.5.11
Products | Support | Contact Us | Intellectual Property Notices
© 1991-2021 LEAD Technologies, Inc. All Rights Reserved.