LEADTOOLS PDF (Leadtools.Pdf assembly)
LEAD Technologies, Inc

ParsePages Method

Example 





One or more PDFParsePagesOptions enumeration member that specify the types of objects to parse.
1-based index of the first page number to parse. Must be greater than or equal to 1 and less than or equal to the number of pages in the document.
1-based index of the last page number to parse. Must be than or equal to firstPageNumber and less than or equal to the number of pages in the document. You can use the special value of -1 to donate "last page in the file".
Parses the objects such as text items (characters), images, rectangles, hyperlinks and fonts from one or more PDF pages
Syntax
public void ParsePages( 
   PDFParsePagesOptions options,
   int firstPageNumber,
   int lastPageNumber
)
'Declaration
 
Public Sub ParsePages( _
   ByVal options As PDFParsePagesOptions, _
   ByVal firstPageNumber As Integer, _
   ByVal lastPageNumber As Integer _
) 
'Usage
 
Dim instance As PDFDocument
Dim options As PDFParsePagesOptions
Dim firstPageNumber As Integer
Dim lastPageNumber As Integer
 
instance.ParsePages(options, firstPageNumber, lastPageNumber)
public void ParsePages( 
   PDFParsePagesOptions options,
   int firstPageNumber,
   int lastPageNumber
)
 function Leadtools.Pdf.PDFDocument.ParsePages( 
   options ,
   firstPageNumber ,
   lastPageNumber 
)
public:
void ParsePages( 
   PDFParsePagesOptions options,
   int firstPageNumber,
   int lastPageNumber
) 

Parameters

options
One or more PDFParsePagesOptions enumeration member that specify the types of objects to parse.
firstPageNumber
1-based index of the first page number to parse. Must be greater than or equal to 1 and less than or equal to the number of pages in the document.
lastPageNumber
1-based index of the last page number to parse. Must be than or equal to firstPageNumber and less than or equal to the number of pages in the document. You can use the special value of -1 to donate "last page in the file".
Remarks

When a PDFDocument object is created, the pages of the PDF document are already parsed and populated in the PDFDocument.Pages collection. Each page may contain other objects such as text items (characters), images, rectangles and hyperlinks as well as the fonts used in these items. These items are not parsed automatically for performance reasons. Instead, you must call the ParsePages method with the page ranges you are interested in (or all pages) and type of items to parse.

Initially, the values of the PDFDocumentPage.Fonts, PDFDocumentPage.Objects and PDFDocumentPage.Hyperlinks lists of each PDFDocumentPage will be set to null (Nothing in Visual Basic). When the ParsePages method is called, the corresponding list will be populated with the items found in the page.

You can parse any type of item you are interested in, this is done through the options parameter of type PDFParsePagesOptions passed to ParsePages as follows:

A white space character such as a space or a tab are parsed by default and returned as individual objects. This behavior can be stopped by OR'ing the PDFParsePagesOptions.IgnoreWhiteSpaces enumeration member with PDFParsePagesOptions.Objects in the options parameter passed to PDFDocument.ParsePages. Note that you can re-construct the words and lines of text in the page without white characters by using the PDFTextProperties.IsEndOfWord and PDFTextProperties.IsEndOfLine properties. The example of PDFTextProperties shows how to do that.

The values of PDFParsePagesOptions can be OR'ed together.

Example
 
Public Sub PDFDocumentParsePagesExample()
      Dim pdfFileName As String = Path.Combine(LEAD_VARS.ImagesDir, "LEAD.pdf")
      Dim txtFileName As String = Path.Combine(LEAD_VARS.ImagesDir, "LEAD_pdf.txt")

      ' Open the document
      Using document As New PDFDocument(pdfFileName)

         ' Parse everything and for all pages
         Dim options As PDFParsePagesOptions = PDFParsePagesOptions.All
         document.ParsePages(options, 1, -1)

         ' Save the results to the text file for examining
         Using writer As StreamWriter = File.CreateText(txtFileName)
            For Each page As PDFDocumentPage In document.Pages
               writer.WriteLine("Page {0}", page.PageNumber)

               Dim fonts As IList(Of PDFFont) = page.Fonts
               ' Note, no need to check if fonts is Nothing since we passed .All
               ' This will either get the fonts or an empty list. Same for all
               ' the other objects
               writer.WriteLine("Fonts: {0}", fonts.Count)
               For Each font As PDFFont In fonts
                  writer.WriteLine("  FaceName: {0}", font.FaceName)
                  writer.WriteLine("  FontStyle: {0}", font.FontStyle.ToString())
                  writer.WriteLine("------")
               Next
               writer.WriteLine("---------------------")

               Dim objects As IList(Of PDFObject) = page.Objects
               writer.WriteLine("Objects: {0}", objects.Count)
               For Each obj As PDFObject In objects
                  writer.WriteLine("  ObjectType: {0}", obj.ObjectType.ToString())
                  writer.WriteLine("  Bounds: {0}, {1}, {2}, {3}", obj.Bounds.Left, obj.Bounds.Top, obj.Bounds.Right, obj.Bounds.Bottom)
                  WriteTextProperties(writer, obj.TextProperties)
                  writer.WriteLine("  Code: {0}", obj.Code)
                  writer.WriteLine("------")
               Next
               writer.WriteLine("---------------------")

               Dim hyperlinks As IList(Of PDFHyperlink) = page.Hyperlinks
               writer.WriteLine("Hyperlinks: {0}", hyperlinks.Count)
               For Each hyperlink As PDFHyperlink In hyperlinks
                  writer.WriteLine("  Hyperlink: {0}", hyperlink.Hyperlink)
                  writer.WriteLine("  Bounds: {0}, {1}, {2}, {3}", hyperlink.Bounds.Left, hyperlink.Bounds.Top, hyperlink.Bounds.Right, hyperlink.Bounds.Bottom)
                  WriteTextProperties(writer, hyperlink.TextProperties)
               Next
               writer.WriteLine("---------------------")
            Next
         End Using
      End Using
   End Sub

   Private Shared Sub WriteTextProperties(ByVal writer As StreamWriter, ByVal textProperties As PDFTextProperties)
      writer.WriteLine("  TextProperties.FontHeight: {0}", textProperties.FontHeight.ToString())
      writer.WriteLine("  TextProperties.FontWidth: {0}", textProperties.FontWidth.ToString())
      writer.WriteLine("  TextProperties.FontIndex: {0}", textProperties.FontIndex.ToString())
      writer.WriteLine("  TextProperties.IsEndOfWord: {0}", textProperties.IsEndOfWord.ToString())
      writer.WriteLine("  TextProperties.IsEndOfLine: {0}", textProperties.IsEndOfLine.ToString())
      writer.WriteLine("  TextProperties.Color: {0}", textProperties.Color.ToString())
   End Sub

Public NotInheritable Class LEAD_VARS
   Public Const ImagesDir As String = "C:\Users\Public\Documents\LEADTOOLS Images"
End Class
public void PDFDocumentParsePagesExample()
   {
      string pdfFileName = Path.Combine(LEAD_VARS.ImagesDir, @"LEAD.pdf");
      string txtFileName = Path.Combine(LEAD_VARS.ImagesDir, @"LEAD_pdf.txt");

      // Open the document
      using(PDFDocument document = new PDFDocument(pdfFileName))
      {
         // Parse everything and for all pages
         PDFParsePagesOptions options = PDFParsePagesOptions.All;
         document.ParsePages(options, 1, -1);

         // Save the results to the text file for examining
         using(StreamWriter writer = File.CreateText(txtFileName))
         {
            foreach(PDFDocumentPage page in document.Pages)
            {
               writer.WriteLine("Page {0}", page.PageNumber);

               IList<PDFFont> fonts = page.Fonts;
               // Note, no need to check if fonts is null since we passed .All
               // This will either get the fonts or an empty list. Same for all
               // the other objects
               writer.WriteLine("Fonts: {0}", fonts.Count);
               foreach(PDFFont font in fonts)
               {
                  writer.WriteLine("  FaceName: {0}", font.FaceName);
                  writer.WriteLine("  FontStyle: {0}", font.FontStyle.ToString());
                  writer.WriteLine("------");
               }
               writer.WriteLine("---------------------");

               IList<PDFObject> objects = page.Objects;
               writer.WriteLine("Objects: {0}", objects.Count);
               foreach(PDFObject obj in objects)
               {
                  writer.WriteLine("  ObjectType: {0}", obj.ObjectType.ToString());
                  writer.WriteLine("  Bounds: {0}, {1}, {2}, {3}", obj.Bounds.Left, obj.Bounds.Top, obj.Bounds.Right, obj.Bounds.Bottom);
                  WriteTextProperties(writer, obj.TextProperties);
                  writer.WriteLine("  Code: {0}", obj.Code);
                  writer.WriteLine("------");
               }
               writer.WriteLine("---------------------");

               IList<PDFHyperlink> hyperlinks = page.Hyperlinks;
               writer.WriteLine("Hyperlinks: {0}", hyperlinks.Count);
               foreach(PDFHyperlink hyperlink in hyperlinks)
               {
                  writer.WriteLine("  Hyperlink: {0}", hyperlink.Hyperlink);
                  writer.WriteLine("  Bounds: {0}, {1}, {2}, {3}", hyperlink.Bounds.Left, hyperlink.Bounds.Top, hyperlink.Bounds.Right, hyperlink.Bounds.Bottom);
                  WriteTextProperties(writer, hyperlink.TextProperties);
               }
               writer.WriteLine("---------------------");
            }
         }
      }
   }

   private static void WriteTextProperties(StreamWriter writer, PDFTextProperties textProperties)
   {
      writer.WriteLine("  TextProperties.FontHeight: {0}", textProperties.FontHeight.ToString());
      writer.WriteLine("  TextProperties.FontWidth: {0}", textProperties.FontWidth.ToString());
      writer.WriteLine("  TextProperties.FontIndex: {0}", textProperties.FontIndex.ToString());
      writer.WriteLine("  TextProperties.IsEndOfWord: {0}", textProperties.IsEndOfWord.ToString());
      writer.WriteLine("  TextProperties.IsEndOfLine: {0}", textProperties.IsEndOfLine.ToString());
      writer.WriteLine("  TextProperties.Color: {0}", textProperties.Color.ToString());
   }

static class LEAD_VARS
{
   public const string ImagesDir = @"C:\Users\Public\Documents\LEADTOOLS Images";
}
Requirements

Target Platforms: Windows 7, Windows Vista SP1 or later, Windows XP SP3, Windows Server 2008 (Server Core not supported), Windows Server 2008 R2 (Server Core supported with SP1 or later), Windows Server 2003 SP2

See Also

Reference

PDFDocument Class
PDFDocument Members

 

 


Products | Support | Contact Us | Copyright Notices

© 2006-2012 All Rights Reserved. LEAD Technologies, Inc.