In This Topic ▼

Preprocess an Image for OCR - Java

This tutorial shows how to create a Java application that uses the LEADTOOLS SDK to preprocess images for OCR Recognition.

Overview
Summary	This tutorial covers how to use LEADTOOLS Image Processing SDK technology in a Java application
Completion Time	30 minutes
Project	Download tutorial project (2 KB)
Platform	Java Application
IDE	Eclipse
Runtime License	Download LEADTOOLS
Try it in another language	C#: .NET 6+ (Console), .NET 6+ (WinForms) Java: Java Python: Python

Required Knowledge

Get familiar with the basic steps of creating a project by reviewing the Add References and Set a License tutorial, before working on the Preprocess Image for OCR - Java tutorial.

Create the Project and Add LEADTOOLS References

Start with a copy of the project created in the Add References and Set a License tutorial. If that project is unavailable, follow the steps in that tutorial to create it.

The references needed depend upon the purpose of the project. The following JAR files are needed for this tutorial:

The JAR files are located at <INSTALL_DIR>\LEADTOOLS23\Bin\Java

leadtools.jar
leadtools.codecs.jar
leadtools.document.writer.jar
leadtools.ocr.jar

For a complete list of which JAR files are required for your application, refer to Files to be Included with your Java Application.

Set the License File

The License unlocks the features needed for the project. It must be set before any toolkit function is called. For details, including tutorials for different platforms, refer to Setting a Runtime License.

There are two types of runtime licenses:

Evaluation license, obtained at the time the evaluation toolkit is downloaded. It allows the toolkit to be evaluated.
Deployment license. If a Deployment license file and developer key are needed, refer to Obtaining a License.

Note: Adding LEADTOOLS references and setting a license are covered in more detail in the Add References and Set a License tutorial.

Add the Image Preprocessing and OCR Code

With the project created, the references added, and the license set, coding can begin.

In the Package Explorer, open the _Main.java class. Add the following import statements to the import block at the top.

Java

import java.io.IOException; 
import java.nio.file.Files; 
import java.nio.file.Paths; 
 
import leadtools.*; 
import leadtools.codecs.*; 
import leadtools.document.writer.*; 
import leadtools.ocr.*;

Add a new method called OCRPreprocess() to the _Main class. Call it inside the run() method, after the SetLicense() call.

Java

public static void main(String[] args) throws IOException { 
	new _Main().run(args); 
} 
 
private void run(String[] args) { 
	try { 
		Platform.setLibPath("C:\\LEADTOOLS23\\Bin\\CDLL\\x64"); 
		Platform.loadLibrary(LTLibrary.LEADTOOLS); 
		Platform.loadLibrary(LTLibrary.CODECS); 
		Platform.loadLibrary(LTLibrary.DOCUMENT_WRITER); 
		Platform.loadLibrary(LTLibrary.OCR); 
			 
		SetLicense(); 
			 
		OCRPreprocess(); 
	} catch(Exception ex) { 
		System.err.println(ex.getMessage()); 
		ex.printStackTrace(); 
	} 
}

Add the code below to the OCRPreprocess() method to initialize the LEAD OCR Engine, process the specified input file, preprocess it, and output the recognition results to the specified output file in the specified format.

Java

void OCRPreprocess() { 
	String tifFileName = "C:\\LEADTOOLS23\\Resources\\Images\\ocr1.tif"; 
	String pdfFileName = "C:\\LEADTOOLS23\\Resources\\Images\\cleanupTIF.pdf"; 
	RasterCodecs codecs = new RasterCodecs(); 
	RasterImage image = codecs.load(tifFileName); 
		 
	OcrEngine ocrEngine = OcrEngineManager.createEngine(OcrEngineType.LEAD); 
	ocrEngine.startup(new RasterCodecs(), new DocumentWriter(), null, null); 
		 
	OcrDocument ocrDocument = ocrEngine.getDocumentManager().createDocument(); 
	OcrPage ocrPage = ocrDocument.getPages().addPage(image, null); 
		 
	// Auto-preprocess it 
	ocrPage.autoPreprocess(OcrAutoPreprocessPageCommand.DESKEW, null); 
	ocrPage.autoPreprocess(OcrAutoPreprocessPageCommand.INVERT, null); 
	ocrPage.autoPreprocess(OcrAutoPreprocessPageCommand.ROTATE, null); 
		 
	// Recognize it and save it as PDF 
	ocrPage.recognize(null); 
	ocrDocument.save(pdfFileName, DocumentFormat.PDF, null); 
	System.out.println("File saved successfully."); 
}

Run the Project

Run the project by selecting Run -> Run.

If the steps were followed correctly, the application should OCR the TIFF and provide a cleaned up searchable PDF document.

Wrap-up

This tutorial showed how to initialize the LEAD OCR Engine, process the specified input file, preprocess it, and output the recognition results to the specified output file in the specified format.

Preprocess an Image for OCR - Java

Required Knowledge

Create the Project and Add LEADTOOLS References

Set the License File

Add the Image Preprocessing and OCR Code

Run the Project

Wrap-up

See Also