Manually Recognize and Process a Form - Java

This tutorial shows how to manually recognize a filled form sample against a collection of MasterForm templates, then manually process the form to retrieve information using the LEADTOOLS Low-Level Form Interface. This offers more control over the process than doing it automatically.

Overview  
Summary This tutorial covers how to manually recognize and process a form using LEADTOOLS Low-Level Form Interface in a Java application.
Completion Time 30 minutes
Visual Studio Project Download tutorial project (443 KB)
Platform Java Application
IDE Eclipse / IntelliJ
Development License Download LEADTOOLS
Try it in another language

Required Knowledge

Get familiar with the basic steps of creating a project by reviewing the Add References and Set a License tutorial, before working on the Manually Recognize and Process a Form - Java tutorial.

Create the Project and Add LEADTOOLS References

Start with a copy of the project created in the Add References and Set a License tutorial. If that project is unavailable, follow the steps in that tutorial to create it.

The references needed depend upon the purpose of the project. References can be added by local .jar files located at <INSTALL_DIR>\LEADTOOLS23\Bin\Java.

For this project, the following references are needed:

For a complete list of which JAR files are required for your application, refer to Files to be Included with your Java Application.

Set the License File

The License unlocks the features needed for the project. It must be set before any toolkit function is called. For details, including tutorials for different platforms, refer to Setting a Runtime License.

There are two types of runtime licenses:

Note: Adding LEADTOOLS references and setting a license are covered in more detail in the Add References and Set a License tutorial.

Add the Form Recognize and Process Code

With the project created, the references added, and the license set, coding can begin.

Open the _Main.java class in the Package Explorer. Rename the _Main.java class to ManuallyRecognizeAndProcessAFormTutorial.java. Add the following statements to the import block at the top.

Java
import java.io.File; 
import java.io.IOException; 
import java.nio.file.Files; 
import java.nio.file.Paths; 
import java.util.*; 
 
import leadtools.*; 
import leadtools.codecs.*; 
import leadtools.document.writer.*; 
import leadtools.forms.common.*; 
import leadtools.forms.processing.*; 
import leadtools.forms.recognition.*; 
import leadtools.forms.recognition.ocr.*; 
import leadtools.ocr.*; 

Modify the main() method to create a new instance of the ManuallyRecognizeAndProcessAFormTutorial class and call it is run() method, optionally passing in the program arguments that can be used later. This method will be defined next.

Java
public static void main(String[] args) { 
    new ManuallyRecognizeAndProcessAFormTutorial().run(args); 
} 

Inside the run() method, add the following to set the library path to where the C DLL files are located, as well as load the LEADTOOLS libraries that were previously imported.

Java
private void run(String[] args) { 
	try { 
		Platform.setLibPath("C:\\LEADTOOLS23\\Bin\\CDLL\\x64"); 
			 
		Platform.loadLibrary(LTLibrary.LEADTOOLS); 
		Platform.loadLibrary(LTLibrary.CODECS); 
		Platform.loadLibrary(LTLibrary.OCR); 
			 
		SetLicense(); 
		InitFormsEngines(); 
		CreateMasterFormAttributes(); 
		RecognizeForm(formToRecognize);		 
	} catch (Exception ex){ 
		System.err.println(ex.getMessage()); 
		ex.printStackTrace(); 
	} finally { 
		if (ocrEngine != null) { 
			ocrEngine.shutdown(); 
			ocrEngine.dispose(); 
		} 
		if (codecs != null) 
			codecs.dispose(); 
	} 
} 

Note: The instances to OcrEngine and RasterCodecs will need to be disposed of after they are used in order to properly free those resources, as shown above with calls to the dispose() method.

Add the global variables below to the ManuallyRecognizeAndProcessAFormTutorial class.

Java
private OcrEngine ocrEngine = null; 
private RasterCodecs codecs = null; 
private FormRecognitionEngine recognitionEngine = null; 
private FormProcessingEngine processingEngine = null; 
private String masterformDir = "C:\\LEADTOOLS23\\Resources\\Images\\Forms\\MasterForm Sets\\OCR"; 
private String formToRecognize = "C:\\LEADTOOLS23\\Resources\\Images\\Forms\\Forms to be Recognized\\OCR\\W9_OCR_Filled.tif"; 

Inside the ManuallyRecognizeAndProcessAFormTutorial class add three new methods named InitFormsEngines(), CreateMasterFormAttributes(), and RecognizeForm(String unindentifiedForm). Call all three methods in order inside the run() method, as shown above. The String parameter passed inside the RecognizeForm() method will be the String variable containing the file path to the form you wish to recognize. For the purposes of this tutorial the TIFF file in the following file path will be used: C:\LEADTOOLS23\Resources\Images\Forms\Forms to be Recognized\OCR\W9_OCR_Filled.tif

Add the code below to the InitFormsEngines() method to initialize the FormRecognitionEngine, FormProcessingEngine, and set up the OcrEngine.

Java
private void InitFormsEngines() { 
	try { 
		System.out.println("Initializing engines..."); 
		codecs = new RasterCodecs(); 
		codecs.getOptions().getRasterizeDocument().getLoad().setResolution(300); 
		 
		recognitionEngine = new FormRecognitionEngine(); 
		processingEngine = new FormProcessingEngine(); 
		 
		ocrEngine = OcrEngineManager.createEngine(OcrEngineType.LEAD); 
		ocrEngine.startup(codecs, new DocumentWriter(), null, "C:\\LEADTOOLS23\\Bin\\Common\\OcrLEADRuntime"); 
		 
		OcrObjectsManager ocrObjectsManager = new OcrObjectsManager(ocrEngine); 
		recognitionEngine.getObjectsManagers().add(ocrObjectsManager); 
		 
		processingEngine.setOcrEngine(ocrEngine); 
		 
		System.out.println("Engines initialized successfully."); 
		 
	} catch (Exception ex) { 
		System.err.println(ex.getMessage()); 
		ex.printStackTrace(); 
	} 
} 

Inside the CreateMasterFormAttributes() method add the code below to create the .bin files for each of your master forms, which will contain the master forms attributes used to pair the master form to the filled form during forms recognition.

Java
private void CreateMasterFormAttributes() { 
	System.out.println("Processing MasterForm..."); 
 
	// Iterate over all .tif images in folder 
	File masterformOcrFolder = new File(masterformDir); 
	for (File masterformFile : masterformOcrFolder.listFiles((filename) ->  
	filename.toString().toLowerCase().endsWith(".tif"))) { 
		String masterformfileName = masterformFile.toString(); 
		String masterformName = masterformfileName.substring( 
				masterformfileName.lastIndexOf(File.separator)+1, 
				masterformfileName.lastIndexOf(".")); 
		 
		RasterImage image = null; 
		try { 
			image = codecs.load(masterformfileName, 0, CodecsLoadByteOrder.BGR_OR_GRAY, 1, -1); 
			FormRecognitionAttributes masterFormAttributes = recognitionEngine.createMasterForm(masterformName, UUID.randomUUID(), null); 
			 
			// Get form attributes for each page 
			for (int i=0; i<image.getPageCount(); i++) { 
				image.setPage(i+1); 
				recognitionEngine.addMasterFormPage(masterFormAttributes, image, null); 
			} 
			recognitionEngine.closeMasterForm(masterFormAttributes); 
			 
			// Write the attributes to a file 
			Files.write(Paths.get(masterformName + ".bin"), masterFormAttributes.getData()); 
		} catch (Exception ex) { 
			System.err.println(ex.getMessage()); 
			ex.printStackTrace(); 
		} finally { 
			if (image != null) 
				image.dispose(); 
			if (codecs != null) 
				codecs.dispose(); 
		} 
	} 
	System.out.println("MasterForm Processing Complete."); 
} 

Add the code below to the RecognizeForm() method to load the given form as a RasterImage and run forms recognition to pair the filled form with its corresponding master form.

Java
private void RecognizeForm(String unindentifiedForm) { 
	System.out.println("Recognizing Form..."); 
	 
	String projectDirectory = System.getProperty("user.dir"); 
	RasterImage image  = null; 
	try { 
		image = codecs.load(unindentifiedForm, 0, CodecsLoadByteOrder.BGR_OR_GRAY, 1, -1); 
		 
		FormRecognitionAttributes filledFormAttributes = recognitionEngine.createForm(null); 
		for (int i=0; i<image.getPageCount(); i++) { 
			image.setPage(i+1); 
			recognitionEngine.addFormPage(filledFormAttributes, image, null); 
		} 
		recognitionEngine.closeForm(filledFormAttributes); 
		 
		boolean found = false; 
		File folder = new File(projectDirectory); 
		File[] folderBinFiles = folder.listFiles((filename) -> filename.toString().toLowerCase().endsWith(".bin")); 
		for (File masterformBinFile : folderBinFiles){ 
			 
			String masterformBinFileName = masterformBinFile.toString(); 
			String masterformName = masterformBinFileName.substring( 
					masterformBinFileName.lastIndexOf(File.separator)+1, 
					masterformBinFileName.lastIndexOf(".")); 
			String masterformXmlFilename = masterformDir + "\\" + masterformName + ".xml"; 
			processingEngine.loadFields(masterformXmlFilename); 
			 
			// Compare filled form attributes to masterform attributes 
			FormRecognitionAttributes masterformAttributes = new FormRecognitionAttributes(); 
			masterformAttributes.setData(Files.readAllBytes(masterformBinFile.toPath())); 
			FormRecognitionResult recognitionResult = recognitionEngine.compareForm(masterformAttributes, filledFormAttributes, null); 
			 
			// If confidence>=80, then we've found the masterform 
			if (recognitionResult.getConfidence() >= 80) { 
				List<PageAlignment> alignments = new ArrayList<PageAlignment>(); 
				for (PageRecognitionResult pageResult : recognitionResult.getPageResults()) 
					alignments.add(pageResult.getAlignment()); 
 
				System.out.println("This form has been recognized as a " + masterformName + ":\n"); 
				ProcessForm(image, alignments); 
				found = true; 
				break; 
			} 
		} 
		if (!found) 
			System.out.println("The form could not be recognized."); 
	} catch (Exception ex) { 
		System.err.println(ex.getMessage()); 
		ex.printStackTrace(); 
	} finally { 
		if (image != null) 
			image.dispose(); 
	} 
} 

Create a new method inside the ManuallyRecognizeAndProcessAFormTutorial class named ProcessForm(RasterImage image, List<PageAlignment> alignments). Call this method inside the RecognizeForm() method, as shown above. Add the code below to the ProcessForm() method to display the filled form's processed results to the console.

Java
private void ProcessForm(RasterImage image, List<PageAlignment> alignments) { 
	String resultsMessage = ""; 
	 
	// Throughout all form fields on all pages, show the processed text value 
	processingEngine.process(image, alignments); 
	for (FormPage formPage : processingEngine.getPages()) 
		for (FormField formField : formPage) 
			if (formField != null) 
				// For the sake of demonstration, assume all fields are text 
				resultsMessage += String.format("%s at %s = %s\n", 
						formField.getName(), 
						formField.getBounds().toString(), 
						((TextFormFieldResult)formField.getResult()).getText()); 
	 
	if (resultsMessage == null || resultsMessage == "") 
		System.out.println("No fields were processed."); 
	else 
		System.out.println(resultsMessage); 
} 

Run the Project

Run the project by pressing Ctrl + F11, or by selecting Run -> Run.

If the steps are followed correctly, the application creates identifying attributes for each master form at the given directory, loads the specified filled form, runs forms recognition to pair the filled form with the appropriate master form, then processes the filled form and displays the processed information to the console.

Wrap-Up

This tutorial showed how to use LEADTOOLS Low-Level Form Interface to identify an image as a specific form, and then extract all text field values to display to the console. It also covered how to use the OcrEngine, FormRecognitionEngine, FormProcessingEngine, FormRecognitionAttributes, and FormRecognitionResult classes.

See Also

Help Version 23.0.2024.3.11
Products | Support | Contact Us | Intellectual Property Notices
© 1991-2024 LEAD Technologies, Inc. All Rights Reserved.


Products | Support | Contact Us | Intellectual Property Notices
© 1991-2023 LEAD Technologies, Inc. All Rights Reserved.