←Select platform

PDFParsePagesOptions Enumeration

Summary

Specifies which options to use when parsing the objects of a PDF document.

Syntax

C#
VB
Java
C++
[SerializableAttribute()] 
[FlagsAttribute()] 
public enum PDFParsePagesOptions 
<FlagsAttribute()> 
<SerializableAttribute()> 
Public Enum PDFParsePagesOptions  
    
public enum PDFParsePagesOptions 
[FlagsAttribute()] 
[SerializableAttribute()] 
public enum class PDFParsePagesOptions  

Members

ValueMemberDescription
0x00000000None Do not parse any items
0x00000001Objects Parse the objects of the page such as text items (characters), images and rectangles. Specifying this member will populate the PDFDocumentPage.Objects collection with the objects found in the page
0x00000002Hyperlinks Parse the hyperlinks found in the page. Specifying this member will populate the PDFDocumentPage.Hyperlinks collection with the hyperlinks found in the page
0x00000004Fonts Parse the fonts found in the page. Specifying this member will populate the PDFDocumentPage.Fonts collection with the fonts found in the page
0x00000008IgnoreWhiteSpaces Must be OR'ed with Objects (otherwise it will be ignored). If specified, white space characters such as spaces or tab characters or will not be returned as items in the PDFDocumentPage.Objects collection. Use PDFTextProperties.IsEndOfWord and PDFTextProperties.IsEndOfLine to re-construct the page words and lines as needed
0x00000010Annotations Parse the annotations found in the page. Specifying this member will populate the PDFDocumentPage.Annotations collection with any annotations found in the page
0x00000020RTLOriginal Parse characters right to left as they are stored in the page.
0x00000040RTLFlipBrackets Flip bracket characters for right to left text when parsing the page.
0x00000080InternalLinks Parse all internal links found in the page. This is the equivalent of calling PDFDocument.ParseDocumentStructure with the PDFParseDocumentStructureOptions.InternalLinks option.
0x00000100FormFields Parse the form fields found in the page. Specifying this member will populate the PDFDocumentPage.FormFields collection with the PDF form fields found in the page.
0x00000200Signatures Parse the digital signatures found in the page. Specifying this member will populate the PDFDocumentPage.Signatures collection with the PDF digital signatures found in the page.
0x00000317All Parse all objects with white spaces. This the equivalent of Objects | Hyperlinks | Fonts | Annotations | FormFields | Signatures
0x0000031FAllIgnoreWhiteSpaces Parse all objects without white spaces. This the equivalent of Objects | Hyperlinks | Fonts | Annotations | FormFields | Signatures | IgnoreWhiteSpaces
Remarks

The PDFParsePagesOptions enumeration is used as the type of the options parameter passed to the PDFDocument.ParsePages method.

When a PDFDocument object is created, the pages of the PDF document are already parsed and populated in the PDFDocument.Pages collection. Each page can contain other objects such as text items (characters), images, rectangles, hyperlinks, annotations, form fields and digital signatures as well as the fonts used in these items. These items are not parsed automatically for performance reasons. Instead, call the PDFDocument.ParsePages method with the page ranges you are interested in (or all pages) and the type of items to parse.

Initially, the values of the PDFDocumentPage.Fonts, PDFDocumentPage.Objects, PDFDocumentPage.Hyperlinks, PDFDocumentPage.Annotations, PDFDocumentPage.FormFields and PDFDocumentPage.Signatures lists of each PDFDocumentPage will be set to null. When the PDFDocument.ParsePages method is called, the corresponding list will be populated with the items found in the page.

Any type of item can be parsed. This is done through the options parameter of type PDFParsePagesOptions passed to PDFDocument.ParsePages. The different options and results are as follows:

White space characters such as spaces or tabs are parsed by default and returned as individual objects. Stop this behavior by OR'ing the PDFParsePagesOptions.IgnoreWhiteSpaces enumeration member with PDFParsePagesOptions.Objects in the options parameter passed to PDFDocument.ParsePages. Note that the words and lines of text in the page can be reconstructed without white characters by using the PDFTextProperties.IsEndOfWord and PDFTextProperties.IsEndOfLine properties. The example of PDFTextProperties shows how to do that.

The values of PDFParsePagesOptions can be OR'ed together.

Example

For an example, refer to PDFDocument.ParsePages, PDFDocumentPage and PDFObject.

Requirements

Target Platforms

See Also

Reference

Leadtools.Pdf Namespace

Help Version 19.0.2017.10.27
Products | Support | Contact Us | Copyright Notices
© 1991-2017 LEAD Technologies, Inc. All Rights Reserved.

Leadtools.Pdf Assembly