Take the following steps to create and run a program that shows how scan a document and convert it to a searchable PDF file.
Start Visual Studio
Choose File->New->Project from the menu
In the New Project dialog box, choose either "Visual C# Projects" or "VB Projects" in the Projects Type List, and choose "Windows Application" in Visual Studio 2005 or "Windows Forms Application" in Visual Studio 2008 from the Templates List
Type the project name as "OcrTutorial3" in the Project Name field, and then choose OK. If desired, type a new location for your project or select a directory using the Browse button, and then choose OK.
In the "Solution Explorer" window, right-click on the "References" folder, and select "Add Reference..." from the context menu. In the "Add Reference" dialog box, select the ".NET" tab and browse to LEADTOOLS For .NET "<LEADTOOLS_INSTALLDIR>\Bin\DotNet\Win32" folder and select the following DLLs:
Note: The Leadtools.Codecs.*.dll references added are for the BMP, JPG, CMP, TIF and FAX image file formats. Add any additional file format codec DLL if required in your application.
Drag and drop three buttons in Form1. Leave all the buttons names as the default "button1, button2 ...", then change the Text property of each button to the following:
ButtonTextbutton1Change output directorybutton2Select the Scannerbutton3Scan and OCR
Switch to Form1 code view (Right-click Form1 in the solution explorer then select View Code) and add the following lines at the beginning of the file after any Importsor usingsection if there are any:
Imports LeadtoolsImports Leadtools.CodecsImports Leadtools.TwainImports Leadtools.FormsImports Leadtools.Forms.DocumentWritersImports Leadtools.Forms.OcrImports Leadtools.ImageProcessing.Core
using Leadtools;using Leadtools.Codecs;using Leadtools.Twain;using Leadtools.Forms;using Leadtools.Forms.DocumentWriters;using Leadtools.Forms.Ocr;using Leadtools.ImageProcessing.Core;
Add the following private variables to the Form1 class:
' The OCR engine instancePrivate _ocrEngine As IOcrEngine' The OCR documentPrivate _ocrDocument As IOcrDocument' The Twain sessionPrivate _twainSession As TwainSession' The output directory for saving PDF filesPrivate _outputDirectory As String = "C:\MyImages"' The image processing commands we are going to use to clean the scanned imagePrivate deskewCmd As DeskewCommandPrivate despeckleCmd As DespeckleCommandPrivate dotRemoveCmd As DotRemoveCommandPrivate holePunchRemoveCmd As HolePunchRemoveCommandPrivate lineRemoveCmd As LineRemoveCommand
// The OCR engine instanceprivate IOcrEngine _ocrEngine;// The OCR documentprivate IOcrDocument _ocrDocument;// The Twain sessionprivate TwainSession _twainSession;// The output directory for saving PDF filesprivate string _outputDirectory = @"C:\MyImages";// The image processing commands we are going to use to clean the scanned imageprivate DeskewCommand deskewCmd;private DespeckleCommand despeckleCmd;private DotRemoveCommand dotRemoveCmd;private HolePunchRemoveCommand holePunchRemoveCmd;private LineRemoveCommand lineRemoveCmd;
Add the following code to the Form1 constructor (in VB, you can copy/paste the whole Sub New code from here):
Sub New()' This call is required by the Windows Form Designer.InitializeComponent()' Add any initialization after the InitializeComponent() call.Dim MY_LICENSE_FILE As String = "d:\temp\TestLic.lic"' Unlock the OCR supportDim MY_OCRPRODEVELOPER_KEY As String = "xyz123abc"RasterSupport.SetLicense(MY_LICENSE_FILE, MY_OCRPRODEVELOPER_KEY)' Unlock the PDF save supportDim MY_OCRPDFDEVELOPER_KEY As String = "abc123xyz"RasterSupport.SetLicense(MY_LICENSE_FILE, MY_OCRPDFDEVELOPER_KEY)' Unlock Document supportDim MY_DOCDEVELOPER_KEY As String = "123xyzabc"RasterSupport.SetLicense(MY_LICENSE_FILE, MY_DOCDEVELOPER_KEY)' Initialize the OCR engine_ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False)' Startup the engine_ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 19\Bin\Common\OcrAdvantageRuntime")' Create the OCR document_ocrDocument = _ocrEngine.DocumentManager.CreateDocument()' Initalize Twain scanning session_twainSession = New TwainSession()_twainSession.Startup(Me, "My Company", "My Product", "My Version", "My Application", TwainStartupFlags.None)' Subscribe to the TwainSession.Acquire event to get the imageAddHandler _twainSession.AcquirePage, AddressOf _twainSession_AcquirePage' Initialize the image processing commands we are going to use' Initialize DeskewdeskewCmd = New DeskewCommand()' Initialize DespeckledespeckleCmd = New DespeckleCommand()' Initialize DotRemovedotRemoveCmd = New DotRemoveCommand()dotRemoveCmd.Flags = _DotRemoveCommandFlags.UseDiagonals Or _DotRemoveCommandFlags.UseSizedotRemoveCmd.MaximumDotHeight = 8dotRemoveCmd.MaximumDotWidth = 8dotRemoveCmd.MinimumDotHeight = 2dotRemoveCmd.MinimumDotWidth = 2' Initialize HolePunchRemoveholePunchRemoveCmd = New HolePunchRemoveCommand()holePunchRemoveCmd.Flags = _HolePunchRemoveCommandFlags.UseDpi Or _HolePunchRemoveCommandFlags.UseCount Or _HolePunchRemoveCommandFlags.UseLocationholePunchRemoveCmd.Location = HolePunchRemoveCommandLocation.Left' Initialize LineRemovelineRemoveCmd = New LineRemoveCommand()lineRemoveCmd.MaximumLineWidth = 9lineRemoveCmd.MinimumLineLength = 400lineRemoveCmd.Wall = 15lineRemoveCmd.MaximumWallPercent = 10lineRemoveCmd.Variance = 3lineRemoveCmd.GapLength = 3End Sub
public Form1(){InitializeComponent();// Unlock the OCR supportstring MY_LICENSE_FILE = "d:\\temp\\TestLic.lic";string MY_OCRPRODEVELOPER_KEY = "xyz123abc";RasterSupport.SetLicense(MY_LICENSE_FILE, MY_OCRPRODEVELOPER_KEY);// Unlock the PDF save supportstring MY_OCRPDFDEVELOPER_KEY = "abc123xyz";RasterSupport.SetLicense(MY_LICENSE_FILE, MY_OCRPDFDEVELOPER_KEY);// Unlock Document supportstring MY_DOCDEVELOPER_KEY = "123xyzabc";RasterSupport.SetLicense(MY_LICENSE_FILE, MY_DOCDEVELOPER_KEY);// Initialize the OCR engine_ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false);// Startup the engine_ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 19\Bin\Common\OcrAdvantageRuntime");// Create the OCR document_ocrDocument = _ocrEngine.DocumentManager.CreateDocument();// Initalize Twain scanning session_twainSession = new TwainSession();_twainSession.Startup(this, "My Company", "My Product", "My Version", "My Application", TwainStartupFlags.None);// Subscribe to the TwainSession.Acquire event to get the image_twainSession.AcquirePage += new EventHandler<TwainAcquirePageEventArgs>(_twainSession_AcquirePage);// Initialize the image processing commands we are going to use// Initialize DeskewdeskewCmd = new DeskewCommand();// Initialize DespeckledespeckleCmd = new DespeckleCommand();// Initialize DotRemovedotRemoveCmd = new DotRemoveCommand();dotRemoveCmd.Flags =DotRemoveCommandFlags.UseDiagonals |DotRemoveCommandFlags.UseSize;dotRemoveCmd.MaximumDotHeight = 8;dotRemoveCmd.MaximumDotWidth = 8;dotRemoveCmd.MinimumDotHeight = 2;dotRemoveCmd.MinimumDotWidth = 2;// Initialize HolePunchRemoveholePunchRemoveCmd = new HolePunchRemoveCommand();holePunchRemoveCmd.Flags =HolePunchRemoveCommandFlags.UseDpi |HolePunchRemoveCommandFlags.UseCount |HolePunchRemoveCommandFlags.UseLocation;holePunchRemoveCmd.Location = HolePunchRemoveCommandLocation.Left;// Initialize LineRemovelineRemoveCmd = new LineRemoveCommand();lineRemoveCmd.MaximumLineWidth = 9;lineRemoveCmd.MinimumLineLength = 400;lineRemoveCmd.Wall = 15;lineRemoveCmd.MaximumWallPercent = 10;lineRemoveCmd.Variance = 3;lineRemoveCmd.GapLength = 3;}
Override the Form1 closed event to add the code necessary to shutdown the OCR engine when the application terminates:
Protected Overrides Sub OnFormClosed(ByVal e As FormClosedEventArgs)' Destroy the OCR document_ocrDocument.Dispose()' Shutdown and dispose the OCR engine_ocrEngine.Dispose()' Close the Twain session_twainSession.Shutdown()MyBase.OnFormClosed(e)End Sub
protected override void OnFormClosed(FormClosedEventArgs e){// Destroy the OCR document_ocrDocument.Dispose();// Shutdown and dispose the OCR engine_ocrEngine.Dispose();// Close the Twain session_twainSession.Shutdown();base.OnFormClosed(e);}
Add the following code for the button1 (Change output directory) control’s Clickhandler:
Private Sub button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles button1.Click' Change the output directoryDim dlg As New FolderBrowserDialog()dlg.SelectedPath = _outputDirectorydlg.ShowNewFolderButton = TrueIf (dlg.ShowDialog(Me) = DialogResult.OK) Then_outputDirectory = System.IO.Path.GetFullPath(dlg.SelectedPath)End IfEnd Sub
private void button1_Click(object sender, EventArgs e){// Change the output directoryFolderBrowserDialog dlg = new FolderBrowserDialog();dlg.SelectedPath = _outputDirectory;dlg.ShowNewFolderButton = true;if(dlg.ShowDialog(this) == DialogResult.OK)_outputDirectory = System.IO.Path.GetFullPath(dlg.SelectedPath);}
Add the following code for the button2 (Select the Scanner) control’s Clickhandler:
Private Sub button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles button2.Click' Select the scanner to use_twainSession.SelectSource(Nothing)End Sub
private void button2_Click(object sender, EventArgs e){// Select the scanner to use_twainSession.SelectSource(null);}
Add the following code for the button3 (Scan and OCR) control’s Clickhandler:
Private Sub button3_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles button3.Click' Create the output directory if it does not existIf (Not System.IO.Directory.Exists(_outputDirectory)) ThenSystem.IO.Directory.CreateDirectory(_outputDirectory)End If' Build the output PDF file nameDim pdfFileName As String = System.IO.Path.Combine(_outputDirectory, "Scanned.pdf")' First remove all the pages added to the OCR document_ocrDocument.Pages.Clear()' Scan the new page(s)_twainSession.Acquire(TwainUserInterfaceFlags.Show)' The pages should be added to the OCR document now.' Recognize and save as PDF_ocrDocument.Pages.Recognize(Nothing)_ocrDocument.Save(pdfFileName, DocumentFormat.Pdf, Nothing)' Show the result PDF fileSystem.Diagnostics.Process.Start(pdfFileName)End Sub
private void button3_Click(object sender, EventArgs e){// Create the output directory if it does not existif(!System.IO.Directory.Exists(_outputDirectory))System.IO.Directory.CreateDirectory(_outputDirectory);// Build the output PDF file namestring pdfFileName = System.IO.Path.Combine(_outputDirectory, "Scanned.pdf");// First remove all the pages added to the OCR document_ocrDocument.Pages.Clear();// Scan the new page(s)_twainSession.Acquire(TwainUserInterfaceFlags.Show);// The pages should be added to the OCR document now.// Recognize and save as PDF_ocrDocument.Pages.Recognize(null);_ocrDocument.Save(pdfFileName, DocumentFormat.Pdf, null);// Show the result PDF fileSystem.Diagnostics.Process.Start(pdfFileName);}
Add the private method to handle the AcquirePageevent of the TwainSessionobject:
Private Sub _twainSession_AcquirePage(ByVal sender As Object, ByVal e As TwainAcquirePageEventArgs)' We have a pageDim image As RasterImage = e.Image' First, run the image processing commands on it' DeskewdeskewCmd.Run(image)' DespeckledespeckleCmd.Run(image)' The rest of the commands only work on 1 BPP imageIf (image.BitsPerPixel = 1) Then' Dot RemovedotRemoveCmd.Run(image)' Hole Punch RemoveholePunchRemoveCmd.Run(image)' Vertical Line RemovelineRemoveCmd.Type = LineRemoveCommandType.VerticallineRemoveCmd.Run(image)' Horizontal Line RemovelineRemoveCmd.Type = LineRemoveCommandType.HorizontallineRemoveCmd.Run(image)End If' Add the image as a new page to the OCR document_ocrDocument.Pages.AddPage(image, Nothing)End Sub
private void _twainSession_AcquirePage(object sender, TwainAcquirePageEventArgs e){// We have a pageRasterImage image = e.Image;// First, run the image processing commands on it// DeskewdeskewCmd.Run(image);// DespeckledespeckleCmd.Run(image);// The rest of the commands only work on 1 BPP imageif(image.BitsPerPixel == 1){// Dot RemovedotRemoveCmd.Run(image);// Hole Punch RemoveholePunchRemoveCmd.Run(image);// Vertical Line RemovelineRemoveCmd.Type = LineRemoveCommandType.Vertical;lineRemoveCmd.Run(image);// Horizontal Line RemovelineRemoveCmd.Type = LineRemoveCommandType.Horizontal;lineRemoveCmd.Run(image);}// Add the image as a new page to the OCR document_ocrDocument.Pages.AddPage(image, null);}