LEADTOOLS OCR (Leadtools.Forms.Ocr assembly)
LEAD Technologies, Inc

IOcrPage Interface

Example 





Members 
Defines an image page in an OCR document. .NET support
Object Model
IOcrPage InterfaceIOcrDocument InterfaceIOcrTableZoneManager InterfaceIOcrZoneCollection Interface
Syntax
public interface IOcrPage 
'Declaration
 
Public Interface IOcrPage 
'Usage
 
Dim instance As IOcrPage
public interface IOcrPage 
function Leadtools.Forms.Ocr.IOcrPage() 
public interface class IOcrPage 
Remarks

IOcrPage defines a page currently added in the OCR engine. Each page contains the raster image used to create it (the image used when the page is loaded or added) and a group of OCR zones for the page either added manually or through auto-zoning.

You can access the pages inside the OCR document (IOcrDocument) through the IOcrDocument.Pages property. The value of this property is a IOcrPageCollection interface. This interface implements standard .NET ICollection, IList, and IEnumerable interfaces and hence, you can use the member of these interfaces to add, remove, get, set and iterate through the different pages of the document.

You cannot create IOcrPage objects directly. Instead, add pages to the engine through the various AddPage, AddPages, InsertPage and InsertPages methods of the IOcrPageCollection interface. Once a page is added, access it by index to get the IOcrPage object associated with it.

Each page contains a collection of OCR zones. This collection can be accessed with the Zones member. This member implements the IOcrZoneCollection interface which also implements the same standard .NET collections interfaces as IOcrPageCollection. Hence you can use Zones to add, remove, get, set and iterate through the various zones in the page.

After adding a page to an OCR document and optionally manipulating the zones inside it, call the Recognize or RecognizeText methods to collect the recognition data of the page. This data is stored internally in the page and can later be saved to one of the many document file formats supported by the engine such as PDF or Microsoft Word.

After a page is recognized, examine and modify the recognition data (characters and words) through the GetRecognizedCharacters and SetRecognizedCharacters methods.

Once an IOcrPage object is obtained on a page, you can do the following:

Note, the LEADTOOLS Plus OCR engine does not support image sizes greater than A3 paper size (11.7 by 16.5 inches at 300 dpi). Attempting to add an image that has a size greater than A3 will result in an error. For larger documents, you must first resize the image before adding it to the LEADTOOLS Plus OCR engine. The Professional and Advantage engines do not have a restriction on the image size.

Example
Copy CodeCopy Code  
Public Sub OcrPageExample()
      Dim tifFileName As String = Path.Combine(LEAD_VARS.ImagesDir, "Ocr1.tif")
      Dim pdfFileName As String = Path.Combine(LEAD_VARS.ImagesDir, "Ocr1.pdf")
      ' Create an instance of the engine
      Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Plus, False)
         ' Start the engine using default parameters
         ocrEngine.Startup(Nothing, Nothing, Nothing, Nothing)

         ' Create an OCR document
         Using ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument()
            ' Add this image to the document
            Dim ocrPage As IOcrPage = ocrDocument.Pages.AddPage(tifFileName, Nothing)

            ' Auto-recognize the zones in the page
            ocrPage.AutoZone(Nothing)

            ' Show its information
            Console.WriteLine("Size: {0} by {1} pixels", ocrPage.Width, ocrPage.Height)
            Console.WriteLine("Resolution: {0} by {1} dots/inch", ocrPage.DpiX, ocrPage.DpiX)
            Console.WriteLine("Bits/Pixel: {0}, Bytes/Line: {1}", ocrPage.BitsPerPixel, ocrPage.BytesPerLine)

            Dim palette() As Byte = ocrPage.GetPalette()
            Dim paletteEntries As Integer
            If (Not palette Is Nothing) Then
               paletteEntries = palette.Length \ 3
            Else
               paletteEntries = 0
            End If

            Console.WriteLine("Number of entries in the palette: {0}", paletteEntries)
            Console.WriteLine("Original format of this page: {0}", ocrPage.OriginalFormat)
            Console.WriteLine("Has this page been recognized? : {0}", ocrPage.IsRecognized)
            ShowZonesInfo(ocrPage)

            ' Recognize it and save it as PDF
            ocrPage.Recognize(Nothing)
            ocrDocument.Save(pdfFileName, DocumentFormat.Pdf, Nothing)
         End Using

         ' Shutdown the engine
         ' Note: calling Dispose will also automatically shutdown the engine if it has been started
         ocrEngine.Shutdown()
      End Using
   End Sub

   Private Sub ShowZonesInfo(ByVal ocrPage As IOcrPage)
      Console.WriteLine("Zones:")
      For Each ocrZone As OcrZone In ocrPage.Zones
         Dim index As Integer = ocrPage.Zones.IndexOf(ocrZone)
         Console.WriteLine("Zone index: {0}", index)
         Console.WriteLine("  Id                  {0}", ocrZone.Id)
         Console.WriteLine("  Bounds              {0}", ocrZone.Bounds)
         Console.WriteLine("  ZoneType            {0}", ocrZone.ZoneType)
         Console.WriteLine("  FillMethod:         {0}", ocrZone.FillMethod)
         Console.WriteLine("  RecognitionModule:  {0}", ocrZone.RecognitionModule)
         Console.WriteLine("  CharacterFilters:   {0}", ocrZone.CharacterFilters)
         Console.WriteLine("----------------------------------")
      Next
   End Sub

Public NotInheritable Class LEAD_VARS
   Public Const ImagesDir As String = "C:\Users\Public\Documents\LEADTOOLS Images"
End Class
public void OcrPageExample()
   {
      string tifFileName = Path.Combine(LEAD_VARS.ImagesDir, "Ocr1.tif");
      string pdfFileName = Path.Combine(LEAD_VARS.ImagesDir, "Ocr1.pdf");
      // Create an instance of the engine
      using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Plus, false))
      {
         // Start the engine using default parameters
         ocrEngine.Startup(null, null, null, null);

         // Create an OCR document
         using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument())
         {
            // Add this image to the document
            IOcrPage ocrPage = ocrDocument.Pages.AddPage(tifFileName, null);

            // Auto-recognize the zones in the page
            ocrPage.AutoZone(null);

            // Show its information
            Console.WriteLine("Size: {0} by {1} pixels", ocrPage.Width, ocrPage.Height);
            Console.WriteLine("Resolution: {0} by {1} dots/inch", ocrPage.DpiX, ocrPage.DpiX);
            Console.WriteLine("Bits/Pixel: {0}, Bytes/Line: {1}", ocrPage.BitsPerPixel, ocrPage.BytesPerLine);

            byte[] palette = ocrPage.GetPalette();
            int paletteEntries;
            if(palette != null)
               paletteEntries = palette.Length / 3;
            else
               paletteEntries = 0;

            Console.WriteLine("Number of entries in the palette: {0}", paletteEntries);
            Console.WriteLine("Original format of this page: {0}", ocrPage.OriginalFormat);
            Console.WriteLine("Has this page been recognized? : {0}", ocrPage.IsRecognized);
            ShowZonesInfo(ocrPage);

            // Recognize it and save it as PDF
            ocrPage.Recognize(null);
            ocrDocument.Save(pdfFileName, DocumentFormat.Pdf, null);
         }

         // Shutdown the engine
         // Note: calling Dispose will also automatically shutdown the engine if it has been started
         ocrEngine.Shutdown();
      }
   }

   private void ShowZonesInfo(IOcrPage ocrPage)
   {
      Console.WriteLine("Zones:");
      foreach(OcrZone ocrZone in ocrPage.Zones)
      {
         int index = ocrPage.Zones.IndexOf(ocrZone);
         Console.WriteLine("Zone index: {0}", index);
         Console.WriteLine("  Id                  {0}", ocrZone.Id);
         Console.WriteLine("  Bounds              {0}", ocrZone.Bounds);
         Console.WriteLine("  ZoneType            {0}", ocrZone.ZoneType);
         Console.WriteLine("  FillMethod:         {0}", ocrZone.FillMethod);
         Console.WriteLine("  RecognitionModule:  {0}", ocrZone.RecognitionModule);
         Console.WriteLine("  CharacterFilters:   {0}", ocrZone.CharacterFilters);
         Console.WriteLine("----------------------------------");
      }
   }

static class LEAD_VARS
{
   public const string ImagesDir = @"C:\Users\Public\Documents\LEADTOOLS Images";
}
[TestMethod]
public async Task OcrPageExample()
{
   string tifFileName = @"Assets\Ocr1.tif";
   string pdfFileName = "Ocr1.pdf";
   // Create an instance of the engine
   IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false);

   // Start the engine using default parameters
   ocrEngine.Startup(null, null, String.Empty, Tools.OcrEnginePath);

   // Create an OCR document
   IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument();

   // Add this image to the document
   IOcrPage ocrPage = null;
   using (RasterCodecs codecs = new RasterCodecs())
   {
      StorageFile loadFile = await Tools.AppInstallFolder.GetFileAsync(tifFileName);
      using (RasterImage image = await codecs.LoadAsync(LeadStreamFactory.Create(loadFile)))
         ocrPage = ocrDocument.Pages.AddPage(image, null);
   }

   // Auto-recognize the zones in the page
   ocrPage.AutoZone(null);

   // Show its information
   Debug.WriteLine("Size: {0} by {1} pixels", ocrPage.Width, ocrPage.Height);
   Debug.WriteLine("Resolution: {0} by {1} dots/inch", ocrPage.DpiX, ocrPage.DpiX);
   Debug.WriteLine("Bits/Pixel: {0}, Bytes/Line: {1}", ocrPage.BitsPerPixel, ocrPage.BytesPerLine);

   byte[] palette = ocrPage.GetPalette();
   int paletteEntries;
   if(palette != null)
      paletteEntries = palette.Length / 3;
   else
      paletteEntries = 0;

   Debug.WriteLine("Number of entries in the palette: {0}", paletteEntries);
   Debug.WriteLine("Original format of this page: {0}", ocrPage.OriginalFormat);
   Debug.WriteLine("Has this page been recognized? : {0}", ocrPage.IsRecognized);
   ShowZonesInfo(ocrPage);

   // Recognize it and save it as PDF
   ocrPage.Recognize(null);
   StorageFile saveFile = await Tools.AppLocalFolder.CreateFileAsync(pdfFileName, CreationCollisionOption.ReplaceExisting);
   await ocrDocument.SaveAsync(LeadStreamFactory.Create(saveFile), DocumentFormat.Pdf, null);

   // Shutdown the engine
   ocrEngine.Shutdown();
}

private void ShowZonesInfo(IOcrPage ocrPage)
{
   Debug.WriteLine("Zones:");
   foreach(OcrZone ocrZone in ocrPage.Zones)
   {
      int index = ocrPage.Zones.IndexOf(ocrZone);
      Debug.WriteLine("Zone index: {0}", index);
      Debug.WriteLine("  Id                  {0}", ocrZone.Id);
      Debug.WriteLine("  Bounds              {0}", ocrZone.Bounds);
      Debug.WriteLine("  ZoneType            {0}", ocrZone.ZoneType);
      Debug.WriteLine("  FillMethod:         {0}", ocrZone.FillMethod);
      Debug.WriteLine("  RecognitionModule:  {0}", ocrZone.RecognitionModule);
      Debug.WriteLine("  CharacterFilters:   {0}", ocrZone.CharacterFilters);
      Debug.WriteLine("----------------------------------");
   }
}
Requirements

Target Platforms: Windows 7, Windows Vista SP1 or later, Windows XP SP3, Windows Server 2008 (Server Core not supported), Windows Server 2008 R2 (Server Core supported with SP1 or later), Windows Server 2003 SP2

See Also

Reference

IOcrPage Members
Leadtools.Forms.Ocr Namespace
OcrEngineManager Class
OcrEngineType Enumeration
IOcrPageCollection Interface
IOcrZoneCollection Interface
OcrZone Structure
Programming with the LEADTOOLS .NET OCR
Working with OCR Pages

Defines an image page in an OCR document. .NET support
Object Model
IOcrPage InterfaceIOcrDocument InterfaceIOcrTableZoneManager InterfaceIOcrZoneCollection Interface
Syntax
public interface IOcrPage 
'Declaration
 
Public Interface IOcrPage 
'Usage
 
Dim instance As IOcrPage
public interface IOcrPage 
function Leadtools.Forms.Ocr.IOcrPage() 
public interface class IOcrPage 
Remarks

IOcrPage defines a page currently added in the OCR engine. Each page contains the raster image used to create it (the image used when the page is loaded or added) and a group of OCR zones for the page either added manually or through auto-zoning.

You can access the pages inside the OCR document (IOcrDocument) through the IOcrDocument.Pages property. The value of this property is a IOcrPageCollection interface. This interface implements standard .NET ICollection, IList, and IEnumerable interfaces and hence, you can use the member of these interfaces to add, remove, get, set and iterate through the different pages of the document.

You cannot create IOcrPage objects directly. Instead, add pages to the engine through the various AddPage, AddPages, InsertPage and InsertPages methods of the IOcrPageCollection interface. Once a page is added, access it by index to get the IOcrPage object associated with it.

Each page contains a collection of OCR zones. This collection can be accessed with the Zones member. This member implements the IOcrZoneCollection interface which also implements the same standard .NET collections interfaces as IOcrPageCollection. Hence you can use Zones to add, remove, get, set and iterate through the various zones in the page.

After adding a page to an OCR document and optionally manipulating the zones inside it, call the Recognize or RecognizeText methods to collect the recognition data of the page. This data is stored internally in the page and can later be saved to one of the many document file formats supported by the engine such as PDF or Microsoft Word.

After a page is recognized, examine and modify the recognition data (characters and words) through the GetRecognizedCharacters and SetRecognizedCharacters methods.

Once an IOcrPage object is obtained on a page, you can do the following:

Note, the LEADTOOLS Plus OCR engine does not support image sizes greater than A3 paper size (11.7 by 16.5 inches at 300 dpi). Attempting to add an image that has a size greater than A3 will result in an error. For larger documents, you must first resize the image before adding it to the LEADTOOLS Plus OCR engine. The Professional and Advantage engines do not have a restriction on the image size.

Example
Copy CodeCopy Code  
Public Sub OcrPageExample()
      Dim tifFileName As String = Path.Combine(LEAD_VARS.ImagesDir, "Ocr1.tif")
      Dim pdfFileName As String = Path.Combine(LEAD_VARS.ImagesDir, "Ocr1.pdf")
      ' Create an instance of the engine
      Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Plus, False)
         ' Start the engine using default parameters
         ocrEngine.Startup(Nothing, Nothing, Nothing, Nothing)

         ' Create an OCR document
         Using ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument()
            ' Add this image to the document
            Dim ocrPage As IOcrPage = ocrDocument.Pages.AddPage(tifFileName, Nothing)

            ' Auto-recognize the zones in the page
            ocrPage.AutoZone(Nothing)

            ' Show its information
            Console.WriteLine("Size: {0} by {1} pixels", ocrPage.Width, ocrPage.Height)
            Console.WriteLine("Resolution: {0} by {1} dots/inch", ocrPage.DpiX, ocrPage.DpiX)
            Console.WriteLine("Bits/Pixel: {0}, Bytes/Line: {1}", ocrPage.BitsPerPixel, ocrPage.BytesPerLine)

            Dim palette() As Byte = ocrPage.GetPalette()
            Dim paletteEntries As Integer
            If (Not palette Is Nothing) Then
               paletteEntries = palette.Length \ 3
            Else
               paletteEntries = 0
            End If

            Console.WriteLine("Number of entries in the palette: {0}", paletteEntries)
            Console.WriteLine("Original format of this page: {0}", ocrPage.OriginalFormat)
            Console.WriteLine("Has this page been recognized? : {0}", ocrPage.IsRecognized)
            ShowZonesInfo(ocrPage)

            ' Recognize it and save it as PDF
            ocrPage.Recognize(Nothing)
            ocrDocument.Save(pdfFileName, DocumentFormat.Pdf, Nothing)
         End Using

         ' Shutdown the engine
         ' Note: calling Dispose will also automatically shutdown the engine if it has been started
         ocrEngine.Shutdown()
      End Using
   End Sub

   Private Sub ShowZonesInfo(ByVal ocrPage As IOcrPage)
      Console.WriteLine("Zones:")
      For Each ocrZone As OcrZone In ocrPage.Zones
         Dim index As Integer = ocrPage.Zones.IndexOf(ocrZone)
         Console.WriteLine("Zone index: {0}", index)
         Console.WriteLine("  Id                  {0}", ocrZone.Id)
         Console.WriteLine("  Bounds              {0}", ocrZone.Bounds)
         Console.WriteLine("  ZoneType            {0}", ocrZone.ZoneType)
         Console.WriteLine("  FillMethod:         {0}", ocrZone.FillMethod)
         Console.WriteLine("  RecognitionModule:  {0}", ocrZone.RecognitionModule)
         Console.WriteLine("  CharacterFilters:   {0}", ocrZone.CharacterFilters)
         Console.WriteLine("----------------------------------")
      Next
   End Sub

Public NotInheritable Class LEAD_VARS
   Public Const ImagesDir As String = "C:\Users\Public\Documents\LEADTOOLS Images"
End Class
public void OcrPageExample()
   {
      string tifFileName = Path.Combine(LEAD_VARS.ImagesDir, "Ocr1.tif");
      string pdfFileName = Path.Combine(LEAD_VARS.ImagesDir, "Ocr1.pdf");
      // Create an instance of the engine
      using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Plus, false))
      {
         // Start the engine using default parameters
         ocrEngine.Startup(null, null, null, null);

         // Create an OCR document
         using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument())
         {
            // Add this image to the document
            IOcrPage ocrPage = ocrDocument.Pages.AddPage(tifFileName, null);

            // Auto-recognize the zones in the page
            ocrPage.AutoZone(null);

            // Show its information
            Console.WriteLine("Size: {0} by {1} pixels", ocrPage.Width, ocrPage.Height);
            Console.WriteLine("Resolution: {0} by {1} dots/inch", ocrPage.DpiX, ocrPage.DpiX);
            Console.WriteLine("Bits/Pixel: {0}, Bytes/Line: {1}", ocrPage.BitsPerPixel, ocrPage.BytesPerLine);

            byte[] palette = ocrPage.GetPalette();
            int paletteEntries;
            if(palette != null)
               paletteEntries = palette.Length / 3;
            else
               paletteEntries = 0;

            Console.WriteLine("Number of entries in the palette: {0}", paletteEntries);
            Console.WriteLine("Original format of this page: {0}", ocrPage.OriginalFormat);
            Console.WriteLine("Has this page been recognized? : {0}", ocrPage.IsRecognized);
            ShowZonesInfo(ocrPage);

            // Recognize it and save it as PDF
            ocrPage.Recognize(null);
            ocrDocument.Save(pdfFileName, DocumentFormat.Pdf, null);
         }

         // Shutdown the engine
         // Note: calling Dispose will also automatically shutdown the engine if it has been started
         ocrEngine.Shutdown();
      }
   }

   private void ShowZonesInfo(IOcrPage ocrPage)
   {
      Console.WriteLine("Zones:");
      foreach(OcrZone ocrZone in ocrPage.Zones)
      {
         int index = ocrPage.Zones.IndexOf(ocrZone);
         Console.WriteLine("Zone index: {0}", index);
         Console.WriteLine("  Id                  {0}", ocrZone.Id);
         Console.WriteLine("  Bounds              {0}", ocrZone.Bounds);
         Console.WriteLine("  ZoneType            {0}", ocrZone.ZoneType);
         Console.WriteLine("  FillMethod:         {0}", ocrZone.FillMethod);
         Console.WriteLine("  RecognitionModule:  {0}", ocrZone.RecognitionModule);
         Console.WriteLine("  CharacterFilters:   {0}", ocrZone.CharacterFilters);
         Console.WriteLine("----------------------------------");
      }
   }

static class LEAD_VARS
{
   public const string ImagesDir = @"C:\Users\Public\Documents\LEADTOOLS Images";
}
[TestMethod]
public async Task OcrPageExample()
{
   string tifFileName = @"Assets\Ocr1.tif";
   string pdfFileName = "Ocr1.pdf";
   // Create an instance of the engine
   IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false);

   // Start the engine using default parameters
   ocrEngine.Startup(null, null, String.Empty, Tools.OcrEnginePath);

   // Create an OCR document
   IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument();

   // Add this image to the document
   IOcrPage ocrPage = null;
   using (RasterCodecs codecs = new RasterCodecs())
   {
      StorageFile loadFile = await Tools.AppInstallFolder.GetFileAsync(tifFileName);
      using (RasterImage image = await codecs.LoadAsync(LeadStreamFactory.Create(loadFile)))
         ocrPage = ocrDocument.Pages.AddPage(image, null);
   }

   // Auto-recognize the zones in the page
   ocrPage.AutoZone(null);

   // Show its information
   Debug.WriteLine("Size: {0} by {1} pixels", ocrPage.Width, ocrPage.Height);
   Debug.WriteLine("Resolution: {0} by {1} dots/inch", ocrPage.DpiX, ocrPage.DpiX);
   Debug.WriteLine("Bits/Pixel: {0}, Bytes/Line: {1}", ocrPage.BitsPerPixel, ocrPage.BytesPerLine);

   byte[] palette = ocrPage.GetPalette();
   int paletteEntries;
   if(palette != null)
      paletteEntries = palette.Length / 3;
   else
      paletteEntries = 0;

   Debug.WriteLine("Number of entries in the palette: {0}", paletteEntries);
   Debug.WriteLine("Original format of this page: {0}", ocrPage.OriginalFormat);
   Debug.WriteLine("Has this page been recognized? : {0}", ocrPage.IsRecognized);
   ShowZonesInfo(ocrPage);

   // Recognize it and save it as PDF
   ocrPage.Recognize(null);
   StorageFile saveFile = await Tools.AppLocalFolder.CreateFileAsync(pdfFileName, CreationCollisionOption.ReplaceExisting);
   await ocrDocument.SaveAsync(LeadStreamFactory.Create(saveFile), DocumentFormat.Pdf, null);

   // Shutdown the engine
   ocrEngine.Shutdown();
}

private void ShowZonesInfo(IOcrPage ocrPage)
{
   Debug.WriteLine("Zones:");
   foreach(OcrZone ocrZone in ocrPage.Zones)
   {
      int index = ocrPage.Zones.IndexOf(ocrZone);
      Debug.WriteLine("Zone index: {0}", index);
      Debug.WriteLine("  Id                  {0}", ocrZone.Id);
      Debug.WriteLine("  Bounds              {0}", ocrZone.Bounds);
      Debug.WriteLine("  ZoneType            {0}", ocrZone.ZoneType);
      Debug.WriteLine("  FillMethod:         {0}", ocrZone.FillMethod);
      Debug.WriteLine("  RecognitionModule:  {0}", ocrZone.RecognitionModule);
      Debug.WriteLine("  CharacterFilters:   {0}", ocrZone.CharacterFilters);
      Debug.WriteLine("----------------------------------");
   }
}
Requirements

Target Platforms: Windows 7, Windows Vista SP1 or later, Windows XP SP3, Windows Server 2008 (Server Core not supported), Windows Server 2008 R2 (Server Core supported with SP1 or later), Windows Server 2003 SP2

See Also

Reference

IOcrPage Members
Leadtools.Forms.Ocr Namespace
OcrEngineManager Class
OcrEngineType Enumeration
IOcrPageCollection Interface
IOcrZoneCollection Interface
OcrZone Structure
Programming with the LEADTOOLS .NET OCR
Working with OCR Pages

 

 


Products | Support | Contact Us | Copyright Notices

© 2006-2012 All Rights Reserved. LEAD Technologies, Inc.

IOcrPage requires an OCR module license and unlock key. For more information, refer to: Imaging Pro/Document/Medical Features