PDFParsePagesOptions Enumeration

Specifies options to use when parsing the objects of a PDF document.

Syntax

Visual Basic
C#
WinRT C#
WinRT JavaScript
C++

[FlagsAttribute()]
[SerializableAttribute()]
public enum PDFParsePagesOptions : System.Enum

'Declaration
 
<FlagsAttribute()>
<SerializableAttribute()>
Public Enum PDFParsePagesOptions 
   Inherits System.Enum

'Usage
 
Dim instance As PDFParsePagesOptions

[FlagsAttribute()]
[SerializableAttribute()]
public enum PDFParsePagesOptions : System.Enum

Leadtools.Pdf.PDFParsePagesOptions = function() { };
Leadtools.Pdf.PDFParsePagesOptions.prototype = {

  LeadtoolsMemberMarker(replace me)
 };

[FlagsAttribute()]
[SerializableAttribute()]
public enum class PDFParsePagesOptions : public System.Enum

Members

Member	Description
All	Parse all objects with white spaces. This the equivalant of Objects \| Hyperlinks \| Fonts \| Annotations
AllIgnoreWhiteSpaces	Parse all objects without white spaces. This the equivalant of Objects \| Hyperlinks \| Fonts \| Annotations \| IgnoreWhiteSpaces
Annotations	Parse the annotations found in the page. Specifying this member will populate the PDFDocumentPage.Annotations collection with the annotations found in the page
Fonts	Parse the fonts found in the page. Specifying this member will populate the PDFDocumentPage.Fonts collection with the fonts found in the page
Hyperlinks	Parse the hyperlinks found in the page. Specifying this member will populate the PDFDocumentPage.Hyperlinks collection with the hyperlinks found in the page
IgnoreWhiteSpaces	Must be OR'ed with Objects otherwise it will be ignored. If specified, white space characters such as a space or a tab character or will not be returned as items in the PDFDocumentPage.Objects collection. Instead, you must rely on the PDFTextProperties.IsEndOfWord and PDFTextProperties.IsEndOfLine if re-construction of the page words and lines is needed
None	Do not parse any items
Objects	Parse the objects of the page such as text items (characters), images and rectangles. Specifying this member will populate the PDFDocumentPage.Objects collection with the objects found in the page

Remarks

The PDFParsePagesOptions enumeration is used as the type of the options parameter passed to the PDFDocument.ParsePages method.

When a PDFDocument object is created, the pages of the PDF document are already parsed and populated in the PDFDocument.Pages collection. Each page may contain other objects such as text items (characters), images, rectangles and hyperlinks as well as the fonts used in these items. These items are not parsed automatically for performance reasons. Instead, call the PDFDocument.ParsePages method with the page ranges you are interested in (or all pages) and the type of items to parse.

Initially, the values of the PDFDocumentPage.Fonts, PDFDocumentPage.Objects and PDFDocumentPage.Hyperlinks lists of each PDFDocumentPage will be set to null (Nothing in Visual Basic). When the PDFDocument.ParsePages method is called, the corresponding list will be populated with the items found in the page.

You can parse any type of item you are interested in. This is done through the options parameter of type PDFParsePagesOptions passed to PDFDocument.ParsePages. The different options and results are as follows:

If PDFParsePagesOptions.Objects is specified, then the PDFDocumentPage.Objects collection will be populated with a PDFObject object for each object item found in the page. These items can be text (characters), images or rectangles. If there aren't any object items found in the page, then the PDFDocumentPage.Objects will be initialized with an empty collection (PDFDocumentPage.Objects.Count will be 0).
If PDFParsePagesOptions.Hyperlinks is specified, then the PDFDocumentPage.Hyperlinks collection will be populated with a PDFHyperlink object for each hyperlink item found in the page. If no hyperlinks are found in the page, PDFDocumentPage.Hyperlinks will be initialized with an empty collection (PDFDocumentPage.Hyperlinks.Count will be 0).
If PDFParsePagesOptions.Fonts is specified, then the PDFDocumentPage.Fonts collection will be populated with a PDFFont object for each font item found in the page. If no fonts are found in the page, PDFDocumentPage.Fonts will be initialized with an empty collection (PDFDocumentPage.Fonts.Count will be 0).

A white space character such as a space or a tab are parsed by default and returned as individual objects. You can stop this behavior by OR'ing the PDFParsePagesOptions.IgnoreWhiteSpaces enumeration member with PDFParsePagesOptions.Objects in the options parameter passed to PDFDocument.ParsePages. Note that you can re-construct the words and lines of text in the page without white characters by using the PDFTextProperties.IsEndOfWord and PDFTextProperties.IsEndOfLine properties. The example of PDFTextProperties shows how to do that.

The values of PDFParsePagesOptions can be OR'ed together.

Example

For an example, refer to PDFDocument.ParsePages, PDFDocumentPage and PDFObject.

Inheritance Hierarchy

System.Object
   System.ValueType
      System.Enum
         Leadtools.Pdf.PDFParsePagesOptions

Requirements

Target Platforms: Windows 7, Windows Vista SP1 or later, Windows XP SP3, Windows Server 2008 (Server Core not supported), Windows Server 2008 R2 (Server Core supported with SP1 or later), Windows Server 2003 SP2

Reference

Leadtools.Pdf Namespace