Extract Embedded Images from PDF Documents Using LEADTOOLS

Digital images are everywhere you look. There’s no escaping them. They can be found in just about every email, they’re all over social media, and they can be embedded throughout PDFs. Some may embed images into PDFs to make the document look better or to provide visuals. Others may do this to show images for legal reasons, such as insurance adjusters.

Let’s keep the focus on embedded images in PDFs, and how you can use the LEADTOOLS PDF SDK to extract them. Inside PDFs are different objects that can be found, Text, Rectangle, and Image. In order to extract these images from the PDFs, LEADTOOLS has a method that is found in the PDFDcoument Class, the DecodeImage Method. This method does exactly what you would think. It decodes the specified PDF image object embedded in this PDF document.

The following code is the core code for extracting all image objects from a PDF.

using (PDFDocument document = new PDFDocument(sourceFileNamePath))
{
    document.Resolution = 200;

    // Parse the objects in all pages 
    document.ParsePages(PDFParsePagesOptions.Objects, 1, -1);

    using(RasterCodecs codecs = new RasterCodecs())
    {

    	// Look through each page in the document
    	foreach (PDFDocumentPage page in document.Pages)

    		// Check the page for PDFObjects
        	if (page.Objects != null && page.Objects.Count > 0)

        		// If the object type is an image, save it
            	foreach (PDFObject obj in page.Objects)
	                if (obj.ObjectType == PDFObjectType.Image)
    	                using (RasterImage image = document.DecodeImage(obj.ImageObjectNumber))
        	                codecs.Save(image, destinationFileNamePath, RasterImageFormat.Png,
            	            image.BitsPerPixel, 1, 1, -1, CodecsSavePageMode.Append);
	}
}

Download Project

I also have a full project that will scan all PDFs from a given directory and extract all image objects. The application will then save each image to disk in its own folder based on the initial file name. Using my example from earlier, insurance adjusters who create PDFs with embedded images could use this to extract images of accidents, damage to property, etc.

Support

Need help getting this sample up and going? Contact our support team for free technical support! For pricing or licensing questions, you can contact our sales team (sales@leadtools.com) or call us at 704-332-5532.

Save Screenshots as Searchable PDFs using LEADTOOLS OCR »

« Integrate LEADTOOLS Computer Vision Solutions: Multimedia and Motion Detection

Tags: .NETC#image extractionPDF

Nick Villalobos: Technical Marketing Engineer

View Comments

Sirisha says:
2022-03-09 at 06:01
Hi Team, we have requirement compare two pdf files by pixel to pixel so that in the pdf file if images will be there those images should be compared. Is this type of comparision is possible with lead tools comparision in .net core. Regards, Sirisha

Zac Ferraresi says:
2022-03-09 at 14:39
Hello Sirisha,

The LEADTOOLS SDK can accomplish this type of comparison. We have an online demo that showcases how we achieve this functionality. The UI is in HTML but the actual comparisons are done on the server in .NET. You can find the demo here:
https://demo.leadtools.com/JavaScript/DocumentComparison/index.html

There is a tutorial walkthrough for setting up and using this demo here:
https://www.leadtools.com/help/sdk/v22/tutorials/html5-get-started-with-the-document-compare-demo.html

The documentation for the Leadtools.Document.Compare can be found here:
https://www.leadtools.com/help/sdk/v22/dh/dc/namespace.html

Feel free to download our free 60-day evaluation from this link:
https://www.leadtools.com/downloads

And also feel free to reach out to our free technical support via email or chat:
support@leadtools.com
https://www.leadtools.com/support/chat

Thanks,
Zac

LEADTOOLS Blog

Extract Embedded Images from PDF Documents Using LEADTOOLS

Support

View Comments

Related Post