PreprocessPageCommands Property

Summary

Gets a list of the auto-preprocess commands to perform on each document page prior to recognition.

Syntax

Objective-C

C++/CLI

Java

Python

public IList<OcrAutoPreprocessPageCommand> PreprocessPageCommands { get; }

@property (nonatomic, strong, readonly) NSMutableArray<NSNumber *> *preprocessPageCommands

public List<OcrAutoPreprocessPageCommand> getPreprocessPageCommands()

property IList<OcrAutoPreprocessPageCommand>^ PreprocessPageCommands { 
   IList<OcrAutoPreprocessPageCommand>^ get(); 
}

PreprocessPageCommands # get  (IOcrAutoRecognizeManager)

Property Value

A list of OcrAutoPreprocessPageCommand commands to perform on each document page prior to recognition. Default value is an array of on item (OcrAutoPreprocessPageCommand.Deskew).

Remarks

The IOcrAutoRecognizeManager interface also has the following options to use with the Run, RunJob and RunJobAsync methods:

Option	Description
IOcrAutoRecognizeManager.MaximumPagesBeforeLtd	Used to add support for converting a document with unlimited number of pages. An OCR recognition operation on a document that contains a large amount of pages (10 and more) might result in an out of memory error. All of the LEADTOOLS OCR engines supports saving the intermediate recognition results to a temporary LTD file (DocumentFormat.LTD). The result of subsequent pages will be appended to this temporary file. When all the pages of the document have been recognized, the engine will convert the temporary LTD file to the desired output format. The IOcrAutoRecognizeManager.MaximumPagesBeforeLtd property defines the maximum number of pages processed as a whole. For example, if the original document has 20 pages and the value of this property is 8, the engine will recognize the first 8 pages and saves the result to a temporary file, recognizes the second 8 pages and append the results, and finally, recognize the last 4 pages and convert the temporary document to the final format.
IOcrAutoRecognizeManager.PreprocessPageCommands	Holds an array of OcrAutoPreprocessPageCommand items to control what auto-preprocess operation to perform on each page document prior to recognition.

Option

Description

IOcrAutoRecognizeManager.MaximumPagesBeforeLtd

Used to add support for converting a document with unlimited number of pages. An OCR recognition operation on a document that contains a large amount of pages (10 and more) might result in an out of memory error. All of the LEADTOOLS OCR engines supports saving the intermediate recognition results to a temporary LTD file (DocumentFormat.LTD). The result of subsequent pages will be appended to this temporary file. When all the pages of the document have been recognized, the engine will convert the temporary LTD file to the desired output format. The IOcrAutoRecognizeManager.MaximumPagesBeforeLtd property defines the maximum number of pages processed as a whole. For example, if the original document has 20 pages and the value of this property is 8, the engine will recognize the first 8 pages and saves the result to a temporary file, recognizes the second 8 pages and append the results, and finally, recognize the last 4 pages and convert the temporary document to the final format.

IOcrAutoRecognizeManager.PreprocessPageCommands

Holds an array of OcrAutoPreprocessPageCommand items to control what auto-preprocess operation to perform on each page document prior to recognition.

Example

Java

using Leadtools; 
using Leadtools.Codecs; 
using Leadtools.Ocr; 
using Leadtools.Document.Writer; 
using Leadtools.Forms.Common; 
using Leadtools.WinForms; 
 
public void OcrAutoRecognizeManagerExample() 
{ 
   Console.WriteLine("Preparing the source and destination directories..."); 
 
   string sourceDirectory = LEAD_VARS.ImagesDir; 
   string destinationDirectory = Path.Combine(LEAD_VARS.ImagesDir, "AutoRecognizeManagerExample"); 
 
   // Prepare the output directory 
   if (!Directory.Exists(destinationDirectory)) 
   { 
      Directory.CreateDirectory(destinationDirectory); 
   } 
 
   // OCR some images from the source directory into the destination directory: 
   IList<string> imageFiles = new List<string>(); 
 
   for (int i = 1; i <= 4; i++) 
   { 
      imageFiles.Add(Path.Combine(sourceDirectory, string.Format("Ocr{0}.tif", i))); 
   } 
 
   Console.WriteLine("Creating an instance of the engine..."); 
 
   // Create an instance of the engine 
   using (IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD)) 
   { 
      // Start the engine using default parameters 
      Console.WriteLine("Starting up the engine..."); 
      ocrEngine.Startup(null, null, null, LEAD_VARS.OcrLEADRuntimeDir); 
 
      IOcrAutoRecognizeManager ocrAutoRecognizeManager = ocrEngine.AutoRecognizeManager; 
 
      // Use LTD as a temporary format if a document has more than 4 pages to save memory 
      ocrAutoRecognizeManager.MaximumPagesBeforeLtd = 4; 
 
      // Use maximum CPUs/cores of current machine to speed up recognition 
      // Either passing 0 or System.Environment.ProcessorCount 
      ocrAutoRecognizeManager.MaximumThreadsPerJob = 0; 
 
      // Deskew and auto-orient all pages before recognition 
      ocrAutoRecognizeManager.PreprocessPageCommands.Clear(); 
      ocrAutoRecognizeManager.PreprocessPageCommands.Add(OcrAutoPreprocessPageCommand.Deskew); 
      ocrAutoRecognizeManager.PreprocessPageCommands.Add(OcrAutoPreprocessPageCommand.Rotate); 
 
      // Create PDFs with Image/Text option 
      PdfDocumentOptions pdfOptions = ocrEngine.DocumentWriterInstance.GetOptions(DocumentFormat.Pdf) as PdfDocumentOptions; 
      pdfOptions.ImageOverText = true; 
      ocrEngine.DocumentWriterInstance.SetOptions(DocumentFormat.Pdf, pdfOptions); 
 
      // Loop through all the TIF files in the source directory, convert to PDF in the destination directory 
      foreach (string imageFile in imageFiles) 
      { 
         // Construct the name of the document file 
         string documentFileName = Path.Combine(destinationDirectory, Path.GetFileNameWithoutExtension(imageFile)); 
         documentFileName = Path.ChangeExtension(documentFileName, "pdf"); 
 
         // OCR the file 
         Console.WriteLine("Processing {0}", imageFile); 
         ocrAutoRecognizeManager.Run(imageFile, documentFileName, DocumentFormat.Pdf, null, null); 
         Console.WriteLine("Saved: {0}", documentFileName); 
      } 
   } 
} 
 
static class LEAD_VARS 
{ 
   public const string ImagesDir = @"C:\LEADTOOLS23\Resources\Images"; 
   public const string OcrLEADRuntimeDir = @"C:\LEADTOOLS23\Bin\Common\OcrLEADRuntime"; 
}

 
import java.io.File; 
import java.io.FileNotFoundException; 
import java.io.FileWriter; 
import java.io.FilenameFilter; 
import java.io.IOException; 
import java.nio.file.Files; 
import java.nio.file.Path; 
import java.nio.file.Paths; 
import java.util.ArrayList; 
import java.util.concurrent.ExecutorService; 
import java.util.concurrent.Executors; 
import java.util.concurrent.atomic.AtomicInteger; 
 
import org.junit.*; 
import org.junit.runner.JUnitCore; 
import org.junit.runner.Result; 
import org.junit.runner.notification.Failure; 
 
import static org.junit.Assert.*; 
 
import leadtools.*; 
import leadtools.document.writer.*; 
import leadtools.internal.AutoResetEvent; 
import leadtools.ocr.*; 
 
 
public void OcrAutoRecognizeManagerExample() throws IOException { 
 
   final String LEAD_VARS_ImagesDir = "C:\\LEADTOOLS23\\Resources\\Images"; 
   final String LEAD_VARS_OcrLEADRuntimeDir = "C:\\LEADTOOLS23\\Bin\\Common\\OcrLEADRuntime"; 
   System.out.println("Preparing the source and destination directories..."); 
 
   String srcDir = LEAD_VARS_ImagesDir; 
   String destDir = combine(LEAD_VARS_ImagesDir, "AutoRecognizeManagerExample"); 
 
   // Prepare the output directory 
   Path destPath = Paths.get(destDir); 
   Files.createDirectories(destPath); 
 
   // OCR some images from the source directory into the destination directory: 
   ArrayList<String> imageFiles = new ArrayList<String>(); 
 
   for (int i = 1; i <= 4; i++) { 
      imageFiles.add(combine(srcDir, "Ocr" + i + ".tif")); 
   } 
 
   System.out.println("Creating an instance of the engine..."); 
 
   // Create an instance of the engine 
   OcrEngine ocrEngine = OcrEngineManager.createEngine(OcrEngineType.LEAD); 
 
   // Start the engine using default parameters 
   System.out.println("Starting up the engine..."); 
   ocrEngine.startup(null, null, null, LEAD_VARS_OcrLEADRuntimeDir); 
 
   OcrAutoRecognizeManager ocrAutoRecognizeManager = ocrEngine.getAutoRecognizeManager(); 
 
   // Use maximum CPUs/cores of current machine to speed up recognition 
   // Either passing 0 or System.Environment.ProcessorCount 
   ocrAutoRecognizeManager.setMaximumThreadsPerJob(0); 
 
   // Deskew and auto-orient all pages before recognition 
   ocrAutoRecognizeManager.getPreprocessPageCommands().clear(); 
   ocrAutoRecognizeManager.getPreprocessPageCommands().add(OcrAutoPreprocessPageCommand.DESKEW); 
   ocrAutoRecognizeManager.getPreprocessPageCommands().add(OcrAutoPreprocessPageCommand.ROTATE); 
 
   // Create PDFs with Image/Text option 
   PdfDocumentOptions pdfOptions = (PdfDocumentOptions) ocrEngine.getDocumentWriterInstance().getOptions(DocumentFormat.PDF); 
   pdfOptions.setImageOverText(true); 
   ocrEngine.getDocumentWriterInstance().setOptions(DocumentFormat.PDF, pdfOptions); 
 
   // Loop through all the TIF files in the source directory, convert to PDF in the 
   // destination directory 
   for (String imageFile : imageFiles) { 
      // Construct the name of the document file 
      String documentFileName = combine( 
         destDir, 
         imageFile.substring(imageFile.length()-8, imageFile.indexOf(".")) 
      ); 
      documentFileName = (documentFileName + ".pdf"); 
 
      // OCR the file 
      System.out.println("Processing " + imageFile); 
      ocrAutoRecognizeManager.run(imageFile, documentFileName, DocumentFormat.PDF, null); 
      assertTrue("file unsuccessfully saved", (new File(documentFileName)).exists()); 
      System.out.println("Saved: " + documentFileName); 
   } 
   ocrEngine.dispose();; 
}

Requirements

Target Platforms

Reference

IOcrAutoRecognizeManager Interface

IOcrAutoRecognizeManager Members

Programming with the LEADTOOLS .NET OCR

Download our FREE evaluation

Help Version 23.0.2024.4.19

Leadtools.Ocr Assembly

Introduction

Getting Started

Namespaces

Leadtools.Ocr Namespace

Assemblies