Run(Stream,Stream,DocumentFormat,OcrProgressCallback) Method

Summary

Converts an image file in an input stream to a document file in the specified document format in an output stream.

Syntax

C++

public void Run( 
   Stream imageStream, 
   Stream documentStream, 
   DocumentFormat format, 
   OcrProgressCallback callback 
)

Public Sub Run( 
   ByVal imageStream As Stream, 
   ByVal documentStream As Stream, 
   ByVal format As DocumentFormat, 
   ByVal callback As OcrProgressCallback 
)

public:  
   void Run( 
      Stream^ imageStream, 
      Stream^ documentStream, 
      DocumentFormat^ format, 
      OcrProgressCallback^ callback 
   )

Parameters

imageStream

The stream containing the image.

documentStream

The stream that will contain the resulting document file.

format

The output document format. If this parameter is DocumentFormat.User, then the document is saved using the native engine format set in IOcrDocumentManager.EngineFormat if the engine used supports native formats; otherwise, an exception will be thrown.

callback

Optional callback to show operation progress.

This method will perform the following operations:

Trigger the JobStarted event.
Create one or more IOcrDocument objects into which to store the pages. The number of OCR documents created depends on MaximumThreadsPerJob. If MaximumThreadsPerJob is 0 (maximum CPUs/cores) or is greater than 1, and multiple threads are supported by this engine; then more than one document may be created to participate in the recognition process. The document will be created as a disk-based document.
Loop through all the pages in imageStream. For each page:

The page is created using IOcrEngine.CreatePage.

Auto-zoning of the page is performed instead with IOcrPage.AutoZone.

The OCR data of the page is obtained by calling IOcrPage.Recognize.

For the LEADTOOLS OCR Module - LEAD Engine, the page is added to the document using IOcrDocument.Pages.Add.

For engines other than LEADTOOLS OCR Module - LEAD Engine: If multiple documents are used or current number of recognized pages is greater than the maximum specified in MaximumPagesBeforeLtd, then current recognition data is saved to a temporary LTD file and the OCR document is cleared.
After all pages are processed, they are saved to documentStream using the format specified in format. If LTD was used, the temporary file is converted to the final document using DocumentWriter.Convert and optionally DocumentWriter.AppendLtd.
Delete all OCR documents and temporary files.
Trigger the JobCompleted event.
Use the JobProgress event or callback to show the operation progress or to abort it if threading is not used. For more information and an example, refer to OcrProgressCallback.
Use the JobOperation event to get information regarding the current operation being performed. For more information and an example, refer to JobOperation.

The IOcrAutoRecognizeManager interface also has the following options to use with this method:

Option	Description
MaximumPagesBeforeLtd	Add support for converting a document with unlimited number of pages. An OCR recognition operation on a document that contains a large number of pages (10 or more) might result in an out-of-memory error. All of the LEADTOOLS OCR engines support saving the intermediate recognition results to a temporary LTD file (DocumentFormat.LTD). Subsequent pages will be appended to this temporary file. After all the pages of the document have been recognized, the engine will convert the temporary LTD file to the desired output format. The MaximumPagesBeforeLtd property defines the maximum number of pages to be processed at one time. For example, if the original document has 20 pages and the value of this property is 8, the engine will recognize the first 8 pages and save the result to a temporary file, recognize the second set of 8 pages and append those results, and finally, recognize the last 4 pages and convert the temporary document into the final format.
PreprocessPageCommands	Holds an array of OcrAutoPreprocessPageCommand items to control which auto-preprocess operations to perform on each page document prior to recognition.
MaximumThreadsPerJob	Maximum number of threads to use per job. You can instruct IOcrAutoRecognizeManager to use all available machine CPUs/cores when recognizing a document. This will greatly reduce the time required to finish the OCR operation.
JobErrorMode	Resumes operation after non-critical errors. For example, if a source document has a page that could not be recognized, the offending page will be added to the final document as a graphics image and recognition will resume with the next page.
JobStarted, JobProgress, JobOperation and JobCompleted events	Calls events to track when both synchronous and asynchronous jobs have started, are being run, and have completed.
AbortAllJobs	Aborts all running and pending jobs.
EnableTrace	Outputs debug messages to the standard .NET trace listeners.

Example

using Leadtools; 
using Leadtools.Codecs; 
using Leadtools.Ocr; 
using Leadtools.Document.Writer; 
using Leadtools.Forms.Common; 
using Leadtools.WinForms; 
 
public void OcrAutoRecognizeManagerRun3Example() 
{ 
   string tifFileName = Path.Combine(LEAD_VARS.ImagesDir, "Ocr1.tif"); 
   string pdfFileName = Path.Combine(LEAD_VARS.ImagesDir, "Ocr1.pdf"); 
 
   // Create an instance of the engine 
   using (IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD, false)) 
   { 
      // Start the engine using default parameters 
      Console.WriteLine("Starting up the engine..."); 
      ocrEngine.Startup(null, null, null, LEAD_VARS.OcrLEADRuntimeDir); 
 
      IOcrAutoRecognizeManager ocrAutoRecognizeManager = ocrEngine.AutoRecognizeManager; 
 
      using (Stream outputStream = new MemoryStream()) 
      { 
         using (Stream inputStream = File.OpenRead(tifFileName)) 
         { 
            // Recognize the document 
            ocrAutoRecognizeManager.Run(inputStream, outputStream, DocumentFormat.Pdf, null); 
         } 
 
         // Save the result into the output document file 
         outputStream.Seek(0, SeekOrigin.Begin); 
         using (var fileStream = File.Create(pdfFileName)) 
            outputStream.CopyTo(fileStream); 
      } 
   } 
} 
 
static class LEAD_VARS 
{ 
   public const string ImagesDir = @"C:\Users\Public\Documents\LEADTOOLS Images"; 
   public const string OcrLEADRuntimeDir = @"C:\LEADTOOLS 20\Bin\Common\OcrLEADRuntime"; 
}

Imports Leadtools 
Imports Leadtools.Codecs 
Imports Leadtools.Ocr 
Imports Leadtools.Document.Writer 
Imports Leadtools.Forms.Common 
Imports Leadtools.WinForms 
 
Public Sub OcrAutoRecognizeManagerRun3Example() 
   Dim tifFileName As String = Path.Combine(LEAD_VARS.ImagesDir, "Ocr1.tif") 
   Dim pdfFileName As String = Path.Combine(LEAD_VARS.ImagesDir, "Ocr1.pdf") 
 
   ' Create an instance of the engine 
   Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD, False) 
      ' Start the engine using default parameters 
      Console.WriteLine("Starting up the engine...") 
      ocrEngine.Startup(Nothing, Nothing, Nothing, LEAD_VARS.OcrLEADRuntimeDir) 
 
      Dim ocrAutoRecognizeManager As IOcrAutoRecognizeManager = ocrEngine.AutoRecognizeManager 
 
      Using outputStream As New MemoryStream() 
         Using inputStream As Stream = File.OpenRead(tifFileName) 
            ' Recognize the document 
            ocrAutoRecognizeManager.Run(inputStream, outputStream, DocumentFormat.Pdf, Nothing) 
         End Using 
 
         ' Save the result into the output document file 
         outputStream.Seek(0, SeekOrigin.Begin) 
         Using fileStream As Stream = File.Create(pdfFileName) 
            outputStream.CopyTo(fileStream) 
         End Using 
      End Using 
   End Using 
End Sub 
 
Public NotInheritable Class LEAD_VARS 
   Public Const ImagesDir As String = "C:\Users\Public\Documents\LEADTOOLS Images" 
   Public Const OcrLEADRuntimeDir As String = "C:\LEADTOOLS 20\Bin\Common\OcrLEADRuntime" 
End Class

Requirements

Target Platforms

Reference

IOcrAutoRecognizeManager Class

IOcrAutoRecognizeManager Members

Leadtools.Ocr Namespace

Download our FREE evaluation

Help Version 20.0.2020.4.2

Leadtools.Ocr Assembly

Introduction

Getting Started

Namespaces

Leadtools.Ocr Namespace

Assemblies