C#, Java, and PowerShell Code to Extract Images Embedded in a PDF

PDF files are ubiquitous in the world of business. They are used to store documents, images, and other data. In the past, people have asked if there is an easy way to extract graphics such as a chart or photo from a PDF file. If you’re looking to get all of the images from a PDF file or have hundreds or more PDF files to process, then the answer is to use LEADTOOLS.

Extracting images that are embedded in a PDF file is easy with LEADTOOLS. Below are C#, Java, and PowerShell code samples that use LEADTOOLS to extract images from a PDF file.

C# code to extract images embedded in a PDF

/// <summary>
/// Extracts image objects embedded in a PDF file and saves them as TIFF
/// </summary>
/// <param name="pdfPath"></param>
private static void ExtractImagesFromPdf(string pdfPath)
{
    var destinationPath = Path.Combine(Path.GetDirectoryName(pdfPath), @"images\");
    var documentName = Path.GetFileNameWithoutExtension(pdfPath);

    using var pdfDocument = new PDFDocument(pdfPath);
    pdfDocument.ParsePages(PDFParsePagesOptions.Objects, 1, -1);

    foreach (var page in pdfDocument.Pages)
    {
        var embeddedImages = page.Objects.Where(o => o.ObjectType == PDFObjectType.Image).ToArray();
        using var codecs = new RasterCodecs();
        foreach (var imgObj in embeddedImages)
        {
            var  destinationFilePath = destinationPath + documentName + "~page-" + page.PageNumber + "~" + imgObj.ImageObjectNumber + ".tif";
            using var image = pdfDocument.DecodeImage(imgObj.ImageObjectNumber);
            codecs.Save(image, destinationFilePath, RasterImageFormat.TifLzw, image.BitsPerPixel, 1, 1, -1, CodecsSavePageMode.Append);
        }
    }
}

Java code to extract images embedded in a PDF

/**
* Extracts image objects that are embedded in a PDF file
* and saves them to a folder next to the PDF called images
* <p/>
* e.g. getFileName("c:\\temp\\") will return "c:\\temp\\images\\"
*
*
* @param pdfPath
*/
private static void extractImagesFromPdf(String pdfPath) {
    final String destinationFolder = getOutputFolder(pdfPath);
    final String documentName = getBaseName(getFileName(pdfPath));
    final PDFDocument pdfDocument = new PDFDocument(pdfPath);
    pdfDocument.parsePages(PDFParsePagesOptions.OBJECTS.getValue(), 1, -1);
    final RasterCodecs codecs = new RasterCodecs();
    try {
        final List<PDFDocumentPage> pages = pdfDocument.getPages();
        for (PDFDocumentPage page : pages) {
            final int pageNumber = page.getPageNumber();
            for (final PDFObject object : page.getObjects()) {
                if (object.getObjectType() == PDFObjectType.IMAGE) {
                    final String imageObjectNumber = object.getImageObjectNumber();
                    final String destinationFilePath = destinationFolder + documentName + "~page-" + pageNumber + "~"
                            + imageObjectNumber + ".tif";
                    final RasterImage image = pdfDocument.decodeImage(imageObjectNumber);
                    try {
                        codecs.save(image, destinationFilePath, RasterImageFormat.TIFLZW, image.getBitsPerPixel(),
                                1, 1, -1, CodecsSavePageMode.OVERWRITE);
                    } finally {
                        image.dispose();
                    }
                }
            }
        }
    } finally {
        codecs.dispose();
    }
}

PowerShell code to extract images embedded in a PDF

function Export-LtImagesFromPdf {
    <#
    .SYNOPSIS
        Exports images embedded in a PDF file

    .DESCRIPTION
        Exports images embedded in a PDF file

    .PARAMETER PdfPath
        File path to the PDF file that has embedded images to be exported

    .PARAMETER Path
        Folder path to export the embedded images

    .EXAMPLE
        Export-LtImagesFromPdf -PdfPath "c:\temp\a.pdf" -Path "c:\temp\images\"

    .INPUTS
        String

    .OUTPUTS
        void

    .NOTES
        Author:  LEAD Technologies, Inc.
        Website: https://www.leadtools.com
        Twitter: @leadtools
    #>
    [CmdletBinding()]
    param(
        [Parameter(Mandatory)]
        [string]$PdfPath,

        [Parameter(Mandatory)]
        [string]$Path
    )

    if( -not(Test-Path -Path $PdfPath -PathType Leaf) ) {
        Write-Error "File does not exist."
        return $false
    }

    if( -not(Test-Path -Path $Path -PathType Container) ) {
        New-Item -Path $Path -ItemType Directory
    }

    $baseFileName = (Get-Item $PdfPath).Basename

    $pdfDocument = New-Object -TypeName Leadtools.Pdf.PDFDocument -ArgumentList $PdfPath
    $pdfDocument.ParsePages(1, 1, -1)

    ForEach ($page in $pdfDocument.Pages){
        ForEach($object in $page.Objects){
            if( $object.ObjectType -eq [Leadtools.Pdf.PDFObjectType]::Image ){
                $imageObjectNumber = $object.ImageObjectNumber
                $pageNumber = $page.PageNumber
                $image = $pdfDocument.DecodeImage($imageObjectNumber)
                $outputFilePath = (Join-Path -Path $Path -ChildPath ($baseFileName + "~page#-" + $pageNumber + "~" + $imageObjectNumber + ".tif"))
                Export-LTImage -RasterImage $image -Path $outputFilePath -Format ([Leadtools.RasterImageFormat]::Tif)
            }
        }
    }
}

With LEADTOOLS in your collection of toolkits, there is nothing you cannot do with PDF files.

See for yourself – Free evaluation

Download the LEADTOOLS SDK for free. It’s fully-functional for 60 days and comes with free chat and email support.

Stay tuned for more PDF code samples

Did you see our previous post, “C# and Java Code to Digitally Sign PDF Files With a Certificate”? Stay tuned for more PDF examples to see how LEADTOOLS fits into any workflow involving PDF files.