Leadtools.Forms.Ocr Requires Document/Medical product license | Send comments on this topic. | Back to Introduction - All Topics | Help Version 16.5.9.25
OcrXmlOutputOptions Enumeration
See Also  
Leadtools.Forms.Ocr Namespace : OcrXmlOutputOptions Enumeration



Controls the format of the XML data obtained from IOcrDocument.SaveXml.

Syntax

Visual Basic (Declaration) 
<FlagsAttribute()>
<SerializableAttribute()>
Public Enum OcrXmlOutputOptions 
   Inherits Enum
Visual Basic (Usage)Copy Code
Dim instance As OcrXmlOutputOptions
C# 
[FlagsAttribute()]
[SerializableAttribute()]
public enum OcrXmlOutputOptions : Enum 
C++/CLI 
[FlagsAttribute()]
[SerializableAttribute()]
public enum class OcrXmlOutputOptions : public Enum 

Members

MemberDescription
NoneDefault. Write the recognized word values in the result XML data.
CharactersWrite the recognized character values instead of the word values in the result XML data
CharacterAttributesOnly valid with Characters. Write the character attributes (font for example) in the result XML data.

Example

Remarks

The various IOcrDocument.SaveXml methods accept a combination of one or more of the OcrXmlOutputOptions enumeration members to control the format of the output XML data.

The format of the result XML data is as follows:


<?xml version="1.0" encoding="UTF-16" standalone="yes"?>
<pages>
  <page>
    <zone>
      <paragraph>
        <line>
          <word>
            <character/>
            <character/>
          </word>
        </line>
      </paragraph>
    </zone>
  </page>
</pages>

The pages element is repeated once per document and it has no value and no additional attributes.

The page element is repeated for every page in the document (IOcrDocument.Pages.Count). If this page has not been recognized or contains no zones, then the page element will not contain any child zone elements.

The page element has no value and contains the following additional attributes:

AttributeValue
horizontal_resolutionHorizontal resolution of the page. The value is IOcrPage.DpiX.
vertical_resolutionVertical resolution of the page. The value is IOcrPage.DpiY.
widthWidth of the page in pixels. The value is IOcrPage.Width.
heightHeight of the page in pixels. The value is IOcrPage.Height.

The zone element is repeated for every zone in the current page (IOcrPage.Zones). The zone element has no value and contains the following additional attributes:

AttributeValue
typeThe zone type. Either "text" or "graphics". If the zone element is of type "text", then it will contain zero or more paragraph child elements. If the zone is of type "graphics", then it will not contain and other child elements.
leftThe zone left position in pixels. The value is OcrZone.Bounds.Left converted to pixels.
topThe zone top position in pixels. The value is OcrZone.Bounds.Top converted to pixels.
rightThe zone right position in pixels. The value is OcrZone.Bounds.Right converted to pixels.
bottomThe zone bottom position in pixels. The value is OcrZone.Bounds.Bottom converted to pixels.
subtypeThe zone type. The value is OcrZone.ZoneType.
recognition_moduleThe zone recognition module. The value is OcrZone.RecognitionModule.
fill_methodThe fill method. The value is OcrZone.FillMethod.

The paragraph element is repeated for every text paragraph in the current zone and it has no attributes. If this zone has no recognition text, then the paragraph element will not contain any child line elements.

The paragraph element has no attributes and no value.

The line element is repeated for every line of text in the current paragraph. The line element has no value and contains the following additional attributes:

AttributeValue
leftThe line left position in pixels.
topThe line top position in pixels.
rightThe line right position in pixels.
bottomThe line bottom position in pixels. The value of left, top, right and bottom is calculated from the summation of all the boundaries of the words that make up this line.
baseThe position of the baseline of this line. The value is calculated from the summation of the baselines of all the words that make up this line.

The word element is repeated for every word of text in the current line. If OcrXmlOutputOptions.Characters was not specified in the generation options; then the word element will contain the value of the word as its value. Otherwise, the word element will contain no value.

The word element has the following attributes:

AttributeValue
leftThe word left position in pixels.
topThe word top position in pixels.
rightThe word right position in pixels.
bottomThe word bottom position in pixels. The value of left, top, right and bottom is calculated from the summation of all the boundaries of the characters that make up this word.
baseThe position of the baseline of this word. The value is calculated from the summation of the baselines of all the characters that make up this word.

The character element is repeated for every character in the following word only if OcrXmlOutputOptions.Characters was specified in the generation options. Otherwise, the word element will contain no child character elements. If OcrXmlOutputOptions.Characters was specified in the generation options; then the character element will contain the value of the character as its value. Otherwise, the character element will contain no value.

The character element contains the following additional attributes:

AttributeValue
leftThe character left position in pixels.
topThe character top position in pixels.
rightThe character right position in pixels.
bottomThe character bottom position in pixels. The value of left, top, right and bottom is calculated from OcrCharacter.Bounds.
baseThe position of the baseline of this character. The value is OcrCharacter.Base.
confidenceThe confidence of this character. The value is OcrCharacter.Confidence.
font_sizeThe font size in points. The value is OcrCharacter.FontSize. Only available if OcrXmlOutputOptions.CharacterAttributes is specified.
proportional"yes" if the character font is proportional, "no"; otherwise. The value is calculated from OcrCharacter.FontStyle. Only available if OcrXmlOutputOptions.CharacterAttributes is specified.
serif"yes" if the character font is serif, "no"; otherwise. The value is calculated from OcrCharacter.FontStyle. Only available if OcrXmlOutputOptions.CharacterAttributes is specified.
bold"yes" if the character font is bold, "no"; otherwise. The value is calculated from OcrCharacter.FontStyle. Only available if OcrXmlOutputOptions.CharacterAttributes is specified.
italic"yes" if the character font is italic, "no"; otherwise. The value is calculated from OcrCharacter.FontStyle. Only available if OcrXmlOutputOptions.CharacterAttributes is specified.
underline"yes" if the character font is underline, "no"; otherwise. The value is calculated from OcrCharacter.FontStyle. Only available if OcrXmlOutputOptions.CharacterAttributes is specified.

The following is an example of the XML output when OcrXmlOutputOptions.None is specified:


<?xml version="1.0" encoding="UTF-16" standalone="yes"?>
<pages>
  <page horizontal_resolution="300" vertical_resolution="300" width="2544" height="3294">
    <zone type="Text" left="371" top="370" right="831" bottom="420" subtype="Text" recognition_module="Auto" fill_method="Default">
      <paragraph>
        <line left="372" top="371" right="830" bottom="419" base="29">
          <word left="372" top="371" right="554" bottom="409" base="30">License</word>
          <word left="570" top="372" right="830" bottom="419" base="29">Agreement</word>
        </line>
      </paragraph>
    </zone>
  </page>
</pages>

Here is the same XML output when OcrXmlOutputOptions.Characters is specified:


<?xml version="1.0" encoding="UTF-16" standalone="yes"?>
<pages>
  <page horizontal_resolution="300" vertical_resolution="300" width="2544" height="3294">
    <zone type="Text" left="371" top="370" right="831" bottom="420" subtype="Text" recognition_module="Auto" fill_method="Default">
      <paragraph>
        <line left="372" top="371" right="830" bottom="419" base="29">
          <word left="372" top="371" right="554" bottom="409" base="30">
            <character left="372" top="372" right="398" bottom="408" base="36" confidence="100">L</character>
            <character left="402" top="371" right="409" bottom="408" base="37" confidence="100">i</character>
            <character left="414" top="381" right="438" bottom="409" base="27" confidence="100">c</character>
            <character left="442" top="381" right="468" bottom="409" base="27" confidence="100">e</character>
            <character left="472" top="381" right="496" bottom="408" base="27" confidence="100">n</character>
            <character left="501" top="381" right="525" bottom="408" base="27" confidence="100">s</character>
            <character left="529" top="381" right="554" bottom="408" base="27" confidence="100">e</character>
          </word>
          <word left="570" top="372" right="830" bottom="419" base="29">
            <character left="570" top="372" right="604" bottom="408" base="36" confidence="100">A</character>
            <character left="607" top="381" right="633" bottom="419" base="27" confidence="100">g</character>
            <character left="639" top="381" right="655" bottom="408" base="27" confidence="100">r</character>
            <character left="657" top="381" right="682" bottom="408" base="27" confidence="100">e</character>
            <character left="685" top="381" right="710" bottom="408" base="27" confidence="100">e</character>
            <character left="715" top="381" right="753" bottom="408" base="27" confidence="100">m</character>
            <character left="758" top="381" right="783" bottom="408" base="27" confidence="100">e</character>
            <character left="788" top="381" right="812" bottom="408" base="27" confidence="100">n</character>
            <character left="815" top="374" right="830" bottom="408" base="34" confidence="100">t</character>
          </word>
        </line>
      </paragraph>
    </zone>
  </page>
</pages>

Inheritance Hierarchy

System.Object
   System.ValueType
      System.Enum
         Leadtools.Forms.Ocr.OcrXmlOutputOptions

Requirements

Target Platforms: Microsoft .NET Framework 3.0, Windows XP, Windows Server 2003 family, Windows Server 2008 family

See Also

OcrXmlOutputOptions requires an OCR module license and unlock key. For more information, refer to: Imaging Pro/Document/Medical Features