Extract Embedded Images from PDF Documents Using LEADTOOLS

Posted on 2020-01-24 Nick Villalobos

Digital images are everywhere you look. There's no escaping them. They can be found in just about every email, they're all over social media, and they can be embedded throughout PDFs. Some may embed images into PDFs to make the document look better or to provide visuals. Others may do this to show images for legal reasons, such as insurance adjusters.

Let's keep the focus on embedded images in PDFs, and how you can use the LEADTOOLS PDF SDK to extract them. Inside PDFs are different objects that can be found, Text, Rectangle, and Image. In order to extract these images from the PDFs, LEADTOOLS has a method that is found in the PDFDcoument Class, the DecodeImage Method. This method does exactly what you would think. It decodes the specified PDF image object embedded in this PDF document.

The following code is the core code for extracting all image objects from a PDF.

using (PDFDocument document = new PDFDocument(sourceFileNamePath))
{
    document.Resolution = 200;

    // Parse the objects in all pages 
    document.ParsePages(PDFParsePagesOptions.Objects, 1, -1);

    using(RasterCodecs codecs = new RasterCodecs())
    {

    	// Look through each page in the document
    	foreach (PDFDocumentPage page in document.Pages)

    		// Check the page for PDFObjects
        	if (page.Objects != null && page.Objects.Count > 0)

        		// If the object type is an image, save it
            	foreach (PDFObject obj in page.Objects)
	                if (obj.ObjectType == PDFObjectType.Image)
    	                using (RasterImage image = document.DecodeImage(obj.ImageObjectNumber))
        	                codecs.Save(image, destinationFileNamePath, RasterImageFormat.Png,
            	            image.BitsPerPixel, 1, 1, -1, CodecsSavePageMode.Append);
	}
}

Download Project

I also have a full project that will scan all PDFs from a given directory and extract all image objects. The application will then save each image to disk in its own folder based on the initial file name. Using my example from earlier, insurance adjusters who create PDFs with embedded images could use this to extract images of accidents, damage to property, etc.

Support

Need help getting this sample up and going? Contact our support team for free technical support! For pricing or licensing questions, you can contact our sales team ([email protected]) or call us at 704-332-5532.