public IOcrPageCharacters GetRecognizedCharacters() Function GetRecognizedCharacters() As IOcrPageCharacters - (nullable LTOcrPageCharacters *)recognizedCharacters:(NSError **)error public OcrPageCharacters getRecognizedCharacters() IOcrPageCharacters^ GetRecognizedCharacters();
An instance of IOcrPageCharacters containing the last recognized characters data of this IOcrPage.
You must call this method after the IOcrPage has been recognized with the Recognize method. i.e., if the value of the IsRecognized property of this page is false, then calling this method will throw an exception.
You can use the GetRecognizedCharacters to examine the recognized character data. This data contain information about the character codes, their confidence, guess codes, location and position in the page as well as font information. For more information, refer to OcrCharacter.
The GetRecognizedCharacters method returns an instance of IOcrPageCharacters, this instance is a collection of IOcrZoneCharacters. The IOcrZoneCharacters.ZoneIndex property contains the zero-based index of the zone. You can get the zone information by using the same index as the Zones property of this IOcrPage.
If you wish to modify and the apply recognition data back to the page, Use SetRecognizedCharacters.
Use IOcrZoneCharacters.GetWords to get the recognized words of a zone.
Notes on spaces: The LEADTOOLS OCR Module - LEAD Engine will not return any space characters when using the GetRecognizedCharacters method.
The LEADTOOLS OCR Module - OmniPage Engine will not return space characters if the value of the boolean Recognition.SpaceIsValidCharacter setting value is false (the default). If you absolutely require space characters in the recognition results when using the LEADTOOLS OmniPage Engine, then set the value of the boolean Recognition.SpaceIsValidCharacter setting to true ( ocrEngineInstance.SettingManager.SetBooleanValue("Recognition.SpaceIsValidCharacter", true)). For more information on OCR settings, refer to IOcrSettingManager and LEADTOOLS OCR Module - OmniPage Engine Settings.
The SetRecognizedCharacters method will accept space characters in the LEADTOOLS LEAD engine. However, these space characters will be used when generating the final document (PDF) and might affect the final output. Therefore, it is not recommended that you insert space characters when using the LEADTOOLS LEAD engine.
The LEADTOOLS OCR Module - OmniPage Engine will strip any space characters from the results passed to SetRecognizedCharacters if the value of the boolean Recognition.SpaceIsValidCharacter setting value is false (the default). If you absolutely require space characters in the recognition results when using the LEADTOOLS OmniPage Engine, then set the value of the boolean Recognition.SpaceIsValidCharacter setting to true before calling SetRecognizedCharacters.
If you use the GetRecognizedCharacters and SetRecognizedCharacters methods to modify the recognition result prior to saving to an output file, and you are planning on using the engine native save capability (through setting the IOcrDocumentManager.EngineFormat property and using DocumentFormat.User in the IOcrDocument.Save method), then you must change the boolean Recognition.SpaceIsValidCharacter setting to true.
The IOcrPageCharacters interface also contains the IOcrPageCharacters.UpdateWord method that allow to modify the OCR recognition results by updating or deleting the words before optionally saving the results to the final output document.
This example will get the recognized characters of a page, modify them and set them back before saving the final document.
using Leadtools;using Leadtools.Codecs;using Leadtools.Ocr;using Leadtools.Forms.Common;using Leadtools.Document.Writer;using Leadtools.WinForms;using Leadtools.Drawing;using Leadtools.ImageProcessing;using Leadtools.ImageProcessing.Color;public void RecognizedCharactersExample(){// Create an image with some text in itRasterImage image = new RasterImage(RasterMemoryFlags.Conventional, 640, 200, 24, RasterByteOrder.Bgr, RasterViewPerspective.TopLeft, null, IntPtr.Zero, 0);Rectangle imageRect = new Rectangle(0, 0, image.ImageWidth, image.ImageHeight);IntPtr hdc = RasterImagePainter.CreateLeadDC(image);using (Graphics g = Graphics.FromHdc(hdc)){g.SmoothingMode = System.Drawing.Drawing2D.SmoothingMode.HighQuality;g.FillRectangle(Brushes.White, imageRect);using (Font f = new Font("Arial", 20, FontStyle.Regular))g.DrawString("Normal line", f, Brushes.Black, 0, 0);using (Font f = new Font("Arial", 20, FontStyle.Bold))g.DrawString("Bold, italic and underline", f, Brushes.Black, 0, 40);using (Font f = new Font("Courier New", 20, FontStyle.Regular))g.DrawString("Monospaced line", f, Brushes.Black, 0, 80);}RasterImagePainter.DeleteLeadDC(hdc);string textFileName = Path.Combine(LEAD_VARS.ImagesDir, "MyImageWithTest.txt");string pdfFileName = Path.Combine(LEAD_VARS.ImagesDir, "MyImageWithTest.pdf");// Create an instance of the engineusing (IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD)){// Start the engine using default parametersocrEngine.Startup(null, null, null, LEAD_VARS.OcrLEADRuntimeDir);// Create an OCR pageIOcrPage ocrPage = ocrEngine.CreatePage(image, OcrImageSharingMode.AutoDispose);// Recognize this pageocrPage.Recognize(null);// Dump the characters into a text fileusing (StreamWriter writer = File.CreateText(textFileName)){IOcrPageCharacters ocrPageCharacters = ocrPage.GetRecognizedCharacters();foreach (IOcrZoneCharacters ocrZoneCharacters in ocrPageCharacters){// Show the words found in this zone. Get the word boundaries in inchesICollection<OcrWord> words = ocrZoneCharacters.GetWords();Console.WriteLine("Words:");foreach (OcrWord word in words)Console.WriteLine("Word: {0}, at {1}, characters index from {2} to {3}", word.Value, word.Bounds, word.FirstCharacterIndex, word.LastCharacterIndex);bool nextCharacterIsNewWord = true;for (int i = 0; i < ocrZoneCharacters.Count; i++){OcrCharacter ocrCharacter = ocrZoneCharacters[i];// Capitalize the first letter if this is a new wordif (nextCharacterIsNewWord)ocrCharacter.Code = Char.ToUpper(ocrCharacter.Code);writer.WriteLine("Code: {0}, Confidence: {1}, WordIsCertain: {2}, Bounds: {3}, Position: {4}, FontSize: {5}, FontStyle: {6}",ocrCharacter.Code,ocrCharacter.Confidence,ocrCharacter.WordIsCertain,ocrCharacter.Bounds,ocrCharacter.Position,ocrCharacter.FontSize,ocrCharacter.FontStyle);// If the character is bold, make it underlineif ((ocrCharacter.FontStyle & OcrCharacterFontStyle.Bold) == OcrCharacterFontStyle.Bold){ocrCharacter.FontStyle |= OcrCharacterFontStyle.Italic;ocrCharacter.FontStyle |= OcrCharacterFontStyle.Underline;}// Check if next character is the start of a new wordif ((ocrCharacter.Position & OcrCharacterPosition.EndOfWord) == OcrCharacterPosition.EndOfWord ||(ocrCharacter.Position & OcrCharacterPosition.EndOfLine) == OcrCharacterPosition.EndOfLine)nextCharacterIsNewWord = true;elsenextCharacterIsNewWord = false;ocrZoneCharacters[i] = ocrCharacter;}}// Replace the characters with the modified one before we saveocrPage.SetRecognizedCharacters(ocrPageCharacters);}// Create an OCR document so we can save the resultsusing (IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(null, OcrCreateDocumentOptions.AutoDeleteFile)){// Add the page and dispose itocrDocument.Pages.Add(ocrPage);ocrPage.Dispose();// Show the recognition results// Set the PDF options to save as PDF/A text onlyPdfDocumentOptions pdfOptions = ocrEngine.DocumentWriterInstance.GetOptions(DocumentFormat.Pdf) as PdfDocumentOptions;pdfOptions.DocumentType = PdfDocumentType.PdfA;pdfOptions.ImageOverText = false;ocrEngine.DocumentWriterInstance.SetOptions(DocumentFormat.Pdf, pdfOptions);ocrDocument.Save(pdfFileName, DocumentFormat.Pdf, null);// Open and check the result file, it should contain the following text// "Normal Line"// "Bold And Italic Line"// "Monospaced Line"// With the second line bold and underlined now}// Shutdown the engine// Note: calling Dispose will also automatically shutdown the engine if it has been startedocrEngine.Shutdown();}}static class LEAD_VARS{public const string ImagesDir = @"C:\LEADTOOLS21\Resources\Images";public const string OcrLEADRuntimeDir = @"C:\LEADTOOLS21\Bin\Common\OcrLEADRuntime";}
Imports LeadtoolsImports Leadtools.CodecsImports Leadtools.OcrImports Leadtools.FormsImports Leadtools.Document.WriterImports Leadtools.WinFormsImports Leadtools.DrawingImports Leadtools.ImageProcessingImports Leadtools.ImageProcessing.ColorPublic Sub RecognizedCharactersExample()' Create an image with some text in itDim image As New RasterImage(RasterMemoryFlags.Conventional,640, 200, 24,RasterByteOrder.Bgr,RasterViewPerspective.TopLeft,Nothing, IntPtr.Zero, 0)Dim imageRect As New Rectangle(0, 0, image.ImageWidth, image.ImageHeight)Dim hdc As IntPtr = RasterImagePainter.CreateLeadDC(image)Using g As Graphics = Graphics.FromHdc(hdc)g.SmoothingMode = System.Drawing.Drawing2D.SmoothingMode.HighQualityg.FillRectangle(Brushes.White, imageRect)Using f As New Font("Arial", 20, FontStyle.Regular)g.DrawString("Normal line", f, Brushes.Black, 0, 0)End UsingUsing f As New Font("Arial", 20, FontStyle.Bold)g.DrawString("Bold, italic and underline", f, Brushes.Black, 0, 40)End UsingUsing f As New Font("Courier New", 20, FontStyle.Regular)g.DrawString("Monospaced line", f, Brushes.Black, 0, 80)End UsingEnd UsingRasterImagePainter.DeleteLeadDC(hdc)Dim textFileName As String = Path.Combine(LEAD_VARS.ImagesDir, "MyImageWithTest.txt")Dim pdfFileName As String = Path.Combine(LEAD_VARS.ImagesDir, "MyImageWithTest.pdf")' Create an instance of the engineUsing ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD)' Start the engine using default parametersocrEngine.Startup(Nothing, Nothing, Nothing, LEAD_VARS.OcrLEADRuntimeDir)' Create an OCR pageDim ocrPage As IOcrPage = ocrEngine.CreatePage(image, OcrImageSharingMode.AutoDispose)' Recognize this pageocrPage.Recognize(Nothing)' Dump the characters into a text fileUsing writer As StreamWriter = File.CreateText(textFileName)Dim ocrPageCharacters As IOcrPageCharacters = ocrPage.GetRecognizedCharacters()For Each ocrZoneCharacters As IOcrZoneCharacters In ocrPageCharactersDim words As ICollection(Of OcrWord) = ocrZoneCharacters.GetWords()Console.WriteLine("Words:")For Each word As OcrWord In wordsConsole.WriteLine("Word: {0}, at {1}, characters index from {2} to {3}",word.Value, word.Bounds, word.FirstCharacterIndex, word.LastCharacterIndex)NextDim nextCharacterIsNewWord As Boolean = TrueFor i As Integer = 0 To ocrZoneCharacters.Count - 1Dim ocrCharacter As OcrCharacter = ocrZoneCharacters(i)' Capitalize the first letter if this is a new wordIf nextCharacterIsNewWord ThenocrCharacter.Code = [Char].ToUpper(ocrCharacter.Code)End Ifwriter.WriteLine("Code: {0}, Confidence: {1}, WordIsCertain: {2}, Bounds: {3}, Position: {4}, FontSize: {5}, FontStyle: {6}",ocrCharacter.Code,ocrCharacter.Confidence,ocrCharacter.WordIsCertain,ocrCharacter.Bounds,ocrCharacter.Position,ocrCharacter.FontSize,ocrCharacter.FontStyle)' If the charcater is bold, make it underlineIf (ocrCharacter.FontStyle And OcrCharacterFontStyle.Bold) = OcrCharacterFontStyle.Bold ThenocrCharacter.FontStyle = ocrCharacter.FontStyle Or OcrCharacterFontStyle.ItalicocrCharacter.FontStyle = ocrCharacter.FontStyle Or OcrCharacterFontStyle.UnderlineEnd If' Check if next character is the start of a new wordIf (ocrCharacter.Position And OcrCharacterPosition.EndOfWord) = OcrCharacterPosition.EndOfWord OrElse(ocrCharacter.Position And OcrCharacterPosition.EndOfLine) = OcrCharacterPosition.EndOfLine ThennextCharacterIsNewWord = TrueElsenextCharacterIsNewWord = FalseEnd IfocrZoneCharacters(i) = ocrCharacterNextNext' Replace the characters with the modified one before we saveocrPage.SetRecognizedCharacters(ocrPageCharacters)End Using' Create an OCR document so we can save the resultsUsing ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument(Nothing, OcrCreateDocumentOptions.AutoDeleteFile)' Add the page and dispose itocrDocument.Pages.Add(ocrPage)ocrPage.Dispose()' Show the recognition results' Set the PDF options to save as PDF/A text onlyDim pdfOptions As PdfDocumentOptions = TryCast(ocrEngine.DocumentWriterInstance.GetOptions(DocumentFormat.Pdf), PdfDocumentOptions)pdfOptions.DocumentType = PdfDocumentType.PdfApdfOptions.ImageOverText = FalseocrEngine.DocumentWriterInstance.SetOptions(DocumentFormat.Pdf, pdfOptions)' Open and check the result file, it should contain the following text' "Normal Line"' "Bold And Italic Line"' "Monospaced Line"' With the second line bold and underlined nowocrDocument.Save(pdfFileName, DocumentFormat.Pdf, Nothing)End Using' Shutdown the engine' Note: calling Dispose will also automatically shutdown the engine if it has been startedocrEngine.Shutdown()End UsingEnd SubPublic NotInheritable Class LEAD_VARSPublic Const ImagesDir As String = "C:\LEADTOOLS21\Resources\Images"Public Const OcrLEADRuntimeDir As String = "C:\LEADTOOLS21\Bin\Common\OcrLEADRuntime"End Class
Help Collections
Raster .NET | C API | C++ Class Library | HTML5 JavaScript
Document .NET | C API | C++ Class Library | HTML5 JavaScript
Medical .NET | C API | C++ Class Library | HTML5 JavaScript
Medical Web Viewer .NET
Multimedia
Direct Show .NET | C API | Filters
Media Foundation .NET | C API | Transforms
Supported Platforms
.NET, Java, Android, and iOS/macOS Assemblies
Imaging, Medical, and Document
C API/C++ Class Libraries
Imaging, Medical, and Document
HTML5 JavaScript Libraries
Imaging, Medical, and Document
