PDF files are ubiquitous in the world of business. They are used to store documents, images, and other data. In the past, people have asked if there is an easy way to extract graphics such as a chart or photo from a PDF file. If you’re looking to get all of the images from a PDF file or have hundreds or more PDF files to process, then the answer is to use LEADTOOLS.
Extracting images that are embedded in a PDF file is easy with LEADTOOLS. Below are C#, Java, and PowerShell code samples that use LEADTOOLS to extract images from a PDF file.
C# code to extract images embedded in a PDF
/// <summary>
/// Extracts image objects embedded in a PDF file and saves them as TIFF
/// </summary>
/// <param name="pdfPath"></param>
private static void ExtractImagesFromPdf(string pdfPath)
{
var destinationPath = Path.Combine(Path.GetDirectoryName(pdfPath), @"images\");
var documentName = Path.GetFileNameWithoutExtension(pdfPath);
using var pdfDocument = new PDFDocument(pdfPath);
pdfDocument.ParsePages(PDFParsePagesOptions.Objects, 1, -1);
foreach (var page in pdfDocument.Pages)
{
var embeddedImages = page.Objects.Where(o => o.ObjectType == PDFObjectType.Image).ToArray();
using var codecs = new RasterCodecs();
foreach (var imgObj in embeddedImages)
{
var destinationFilePath = destinationPath + documentName + "~page-" + page.PageNumber + "~" + imgObj.ImageObjectNumber + ".tif";
using var image = pdfDocument.DecodeImage(imgObj.ImageObjectNumber);
codecs.Save(image, destinationFilePath, RasterImageFormat.TifLzw, image.BitsPerPixel, 1, 1, -1, CodecsSavePageMode.Append);
}
}
}
Java code to extract images embedded in a PDF
/**
* Extracts image objects that are embedded in a PDF file
* and saves them to a folder next to the PDF called images
* <p/>
* e.g. getFileName("c:\\temp\\") will return "c:\\temp\\images\\"
*
*
* @param pdfPath
*/
private static void extractImagesFromPdf(String pdfPath) {
final String destinationFolder = getOutputFolder(pdfPath);
final String documentName = getBaseName(getFileName(pdfPath));
final PDFDocument pdfDocument = new PDFDocument(pdfPath);
pdfDocument.parsePages(PDFParsePagesOptions.OBJECTS.getValue(), 1, -1);
final RasterCodecs codecs = new RasterCodecs();
try {
final List<PDFDocumentPage> pages = pdfDocument.getPages();
for (PDFDocumentPage page : pages) {
final int pageNumber = page.getPageNumber();
for (final PDFObject object : page.getObjects()) {
if (object.getObjectType() == PDFObjectType.IMAGE) {
final String imageObjectNumber = object.getImageObjectNumber();
final String destinationFilePath = destinationFolder + documentName + "~page-" + pageNumber + "~"
+ imageObjectNumber + ".tif";
final RasterImage image = pdfDocument.decodeImage(imageObjectNumber);
try {
codecs.save(image, destinationFilePath, RasterImageFormat.TIFLZW, image.getBitsPerPixel(),
1, 1, -1, CodecsSavePageMode.OVERWRITE);
} finally {
image.dispose();
}
}
}
}
} finally {
codecs.dispose();
}
}
PowerShell code to extract images embedded in a PDF
function Export-LtImagesFromPdf {
<#
.SYNOPSIS
Exports images embedded in a PDF file
.DESCRIPTION
Exports images embedded in a PDF file
.PARAMETER PdfPath
File path to the PDF file that has embedded images to be exported
.PARAMETER Path
Folder path to export the embedded images
.EXAMPLE
Export-LtImagesFromPdf -PdfPath "c:\temp\a.pdf" -Path "c:\temp\images\"
.INPUTS
String
.OUTPUTS
void
.NOTES
Author: LEAD Technologies, Inc.
Website: https://www.leadtools.com
Twitter: @leadtools
#>
[CmdletBinding()]
param(
[Parameter(Mandatory)]
[string]$PdfPath,
[Parameter(Mandatory)]
[string]$Path
)
if( -not(Test-Path -Path $PdfPath -PathType Leaf) ) {
Write-Error "File does not exist."
return $false
}
if( -not(Test-Path -Path $Path -PathType Container) ) {
New-Item -Path $Path -ItemType Directory
}
$baseFileName = (Get-Item $PdfPath).Basename
$pdfDocument = New-Object -TypeName Leadtools.Pdf.PDFDocument -ArgumentList $PdfPath
$pdfDocument.ParsePages(1, 1, -1)
ForEach ($page in $pdfDocument.Pages){
ForEach($object in $page.Objects){
if( $object.ObjectType -eq [Leadtools.Pdf.PDFObjectType]::Image ){
$imageObjectNumber = $object.ImageObjectNumber
$pageNumber = $page.PageNumber
$image = $pdfDocument.DecodeImage($imageObjectNumber)
$outputFilePath = (Join-Path -Path $Path -ChildPath ($baseFileName + "~page#-" + $pageNumber + "~" + $imageObjectNumber + ".tif"))
Export-LTImage -RasterImage $image -Path $outputFilePath -Format ([Leadtools.RasterImageFormat]::Tif)
}
}
}
}
With LEADTOOLS in your collection of toolkits, there is nothing you cannot do with PDF files.
See for yourself – Free evaluation
Download the LEADTOOLS SDK for free. It’s fully-functional for 60 days and comes with free chat and email support.
Stay tuned for more PDF code samples
Did you see our previous post, “C# and Java Code to Digitally Sign PDF Files With a Certificate”? Stay tuned for more PDF examples to see how LEADTOOLS fits into any workflow involving PDF files.
Need help in the meantime?
Contact our support team for free technical support! For pricing or licensing questions, you can contact our sales team via email or call us at +1-704-332-5532.