Feature Description

Several LEADTOOLS SDK products include comprehensive technology to read, write, and view PDF files. LEADTOOLS PDF technology includes advanced capabilities such as the extraction of text, images, hyperlinks and metadata, editing of bookmarks and annotations, page replacement, split and merge existing PDF documents, convert to PDF/A, linearization and PDF document compression. Combined with advanced rasterization and image display technology, developers can take advantage of these tools to enhance their applications with dynamic document viewing, editing, and assembly features. Furthermore, programmers using .NET (C# & VB), C/C++, Java, and HTML5 can leverage state-of-the-art OCR, OMR, ICR, Forms Recognition, Virtual Printing, and scanning technologies within LEADTOOLS to create any type of document and medical imaging application that utilizes the PDF format.

Tested against thousands of PDF documents, LEADTOOLS PDF SDK technology provides impeccable accuracy that tops many market-leading PDF reading applications. LEADTOOLS accounts for common errors and differences between PDF file versions to give programmers peace of mind, minimize their testing phase and create the best PDF applications on the market.

Overview of LEADTOOLS PDF SDK Technology

 PDF Document Features

  • Load and view any PDF document
  • Extract text (characters, words, and lines), fonts, annotations, rectangles, and hyperlinks with location and size
  • Extract images from PDF documents and save to any of the 150+ file formats supported by LEADTOOLS
  • Full support to read, edit and write PDF annotations
  • Parse the document structure by reading and updating PDF bookmarks (table of contents) and internal links (jumps)
  • Unicode support including Chinese, Japanese, Arabic, and Hebrew character-sets
  • Generate a raster image and thumbnail of any page

 PDF File Features

  • Advanced PDF Optimizer analyzes document features to create the smallest file possible
  • Comprehensive multi-page support including
    • Merge existing PDF files into a single PDF
    • Split a single PDF into multiple PDF files
    • Extract, delete, insert, and replace any page in existing PDF files
  • Convert any existing PDF to PDF/A
  • Convert between PDF versions
  • Convert (distill) postscript to PDF with optimization for eBook, screen, and prepress
  • Convert PDF to vector SVG
  • Linearize (optimize for web viewing) any existing PDF
  • Create auto-print PDF files
  • Read, write and update the PDF document Table of Contents
  • Read, write and update all PDF metadata such as author, title, subject, keywords, and initial view
  • Read and write PDF Digital Signatures
  • Encrypt and decrypt documents

 PDF Annotations and Markup

LEADTOOLS supports reading, displaying, editing, and writing PDF annotations and markups that work seamlessly with Adobe Acrobat and other compliant PDF readers. Annotation is an important feature in document imaging, as it allows users to communicate with each other by writing comments and drawing shapes on top of the document without making permanent changes.

  • Support for all PDF annotation and markup objects
    • Arrow
    • Comment
    • Highlight
    • Line
    • Review
    • Shapes
    • Text
  • Options to control annotation rendering when loading PDF as raster with support for No Appearance Stream annotations
  • Convert PDF annotations to and from LEADTOOLS annotations for live editing
  • Fully functional sample application with source code that implements all of the PDF reading, writing, editing, and annotation features

 OCR PDF Output

LEADTOOLS allows developers to easily convert any image into a searchable text PDF. Searchable text PDFs are generally smaller in size than the comparable raster image and the embedded text can be searched, indexed, and edited.

  • Convert images to searchable text PDF files with as little as three lines of code using LEADTOOLS SDK OCR technology
  • Export as text to get only the text and image-over-text to retain original formatting
  • Multiple PDF versions and flavors including 1.2 - 1.7 and PDF/A
  • Multiple compression options for images within the PDF including:
    • JPEG
    • JPEG 2000
    • CCITT G3/G4
    • JBIG2
    • LZW
    • MRC
  • Convert entire file or only specified pages
  • Create and update PDF document metadata such as author, title, and keywords
  • Protect sensitive data with encrypted PDF documents using RC4 40-bit and RC4 128-bit encryption
  • Control access to the PDF document with User and Owner passwords
  • Options to embed fonts in the PDF file
  • Options to create linearized PDF files for faster web viewing
  • Convert images from disk, memory, Internet, and SharePoint

 Raster Image PDF Features

In addition to handling text-based PDF files, LEADTOOLS fully supports loading, saving, and editing raster image PDFs. This includes rasterizing any text and image-based PDF into thumbnails and full-size document images, as well as converting single and multi-page image formats such as JPEG and TIFF into image-based PDF files.

  • Convert any PDF file to and from more than 150 supported raster image formats
  • Multiple PDF versions and flavors including 1.2 - 1.7 and PDF/A
  • Multiple Compression options including:
    • JPEG
    • JPEG 2000
    • CCITT G3/G4
    • JBIG2
    • LZW
    • MRC
  • Specify RGB or CMYK color space
  • Convert entire file or only specified pages
  • Encrypt and decrypt PDF documents using RC4 40-bit and RC4 128-bit encryption
  • Control access to the PDF document with User and Owner passwords
  • Load PDF from disk, memory, Internet, and SharePoint

PDF Rasterization Options

At the heart of PDF-to-image conversion is the rasterization process. By nature, PDF documents are made up of vector objects such as text and 2d images. These objects have a relative location based on the physical, printed dimensions. This means that PDFs are dynamic documents which can be rasterized to any pixel dimension based on the DPI (Dots Per Inch) while preserving a high-quality display. LEADTOOLS provides maximum flexibility when rasterizing PDF files and allows the developer to control the quality, size, color, and more.

  • Automatically detect the best rasterization options by examining the contents of the PDF
  • Load at any DPI to control overall quality and file size
  • Load at 1, 8, or 24 bits per pixel
  • Render fonts with 2 and 4-bit anti-aliasing resulting in a more readable image
  • Display CIDFonts not embedded in PDF file
  • Detect original DPI of embedded raster images
  • Rescale embedded graphics with 2 and 4-bit anti-aliasing to retain original image quality and reduced graininess

 Vector PDF Features

To get the most out of viewing and converting PDF documents, LEADTOOLS provides specialized frameworks to handle PDF files as vector-based documents. Only available in the Document and Medical Imaging product families, these .NET and HTML5/JavaScript frameworks provide the ultimate experience to develop PDF applications with LEADTOOLS.

 PDF Forms

LEADTOOLS provides developers everything they need to create applications to use existing PDF Forms. Any field can be read, filled and saved.

  • Read PDF form field information such as location and type
  • Enter data into PDF form fields
  • Extract data from filled forms and save as XML
  • Supports both Acrobat Forms Data (FDF) and Adobe XML Forms Architecture (XFA)

 PDF Compression

Maintain quality while maximizing PDF compression with LEADTOOLS advanced image segmentation and compression technologies. The resulting compressed PDF can be loaded and viewed in any PDF viewer that supports standard PDF files. By storing complex mixed raster content (MRC), this process creates PDF files with better compression and quality than a standard raster PDF file.

  • Automatically segment the image with optimization options
  • Manually segment the image to take full control over file size and image quality optimization
  • Multiple compression options including:
    • JPEG
    • JPEG 2000
    • CCITT G3/G4
    • JBIG2
    • LZW
    • MRC
  • Automatic background detection
  • Compress single and multi-page PDF files

Explanation of PDF File Types

In general, PDF and PDF/A files can be categorized into two basic types: raster image and searchable text. Raster image PDFs are comprised of a complete raster image in a PDF wrapper and support multiple compression types including JPEG, JPEG 2000, CCITT G3/G4, JBIG2, LZW, and MRC. The greatest advantage of raster image PDFs is that they appear identical to the original document. On the other hand, searchable text PDFs are often smaller in size and the text can be searched and edited.

When converting from raster images to searchable text based PDFs, the formatting of the original image is often modified. To alleviate this concern, LEAD has implemented a hybrid type of PDF known as "image over text". In image-over-text PDF files, the text is formatted as usual, but the original raster image is overlaid on top of the text. This maintains the look and formatting of the original raster image while still allowing the text content to be searched, selected, and copied.

Technology Related to PDF