X
    Categories: PDF

Extract Embedded Images from PDF Documents Using LEADTOOLS

Digital images are everywhere you look. There’s no escaping them. They can be found in just about every email, they’re all over social media, and they can be embedded throughout PDFs. Some may embed images into PDFs to make the document look better or to provide visuals. Others may do this to show images for legal reasons, such as insurance adjusters.

Let’s keep the focus on embedded images in PDFs, and how you can use the LEADTOOLS PDF SDK to extract them. Inside PDFs are different objects that can be found, Text, Rectangle, and Image. In order to extract these images from the PDFs, LEADTOOLS has a method that is found in the PDFDcoument Class, the DecodeImage Method. This method does exactly what you would think. It decodes the specified PDF image object embedded in this PDF document.

The following code is the core code for extracting all image objects from a PDF.

using (PDFDocument document = new PDFDocument(sourceFileNamePath))
{
    document.Resolution = 200;

    // Parse the objects in all pages 
    document.ParsePages(PDFParsePagesOptions.Objects, 1, -1);

    using(RasterCodecs codecs = new RasterCodecs())
    {

    	// Look through each page in the document
    	foreach (PDFDocumentPage page in document.Pages)

    		// Check the page for PDFObjects
        	if (page.Objects != null && page.Objects.Count > 0)

        		// If the object type is an image, save it
            	foreach (PDFObject obj in page.Objects)
	                if (obj.ObjectType == PDFObjectType.Image)
    	                using (RasterImage image = document.DecodeImage(obj.ImageObjectNumber))
        	                codecs.Save(image, destinationFileNamePath, RasterImageFormat.Png,
            	            image.BitsPerPixel, 1, 1, -1, CodecsSavePageMode.Append);
	}
}
	

Download Project

I also have a full project that will scan all PDFs from a given directory and extract all image objects. The application will then save each image to disk in its own folder based on the initial file name. Using my example from earlier, insurance adjusters who create PDFs with embedded images could use this to extract images of accidents, damage to property, etc.

Support

Need help getting this sample up and going? Contact our support team for free technical support! For pricing or licensing questions, you can contact our sales team (sales@leadtools.com) or call us at 704-332-5532.

Nick Villalobos: Technical Marketing Engineer

View Comments

  • Hi Team, we have requirement compare two pdf files by pixel to pixel so that in the pdf file if images will be there those images should be compared. Is this type of comparision is possible with lead tools comparision in .net core. Regards, Sirisha

    • Hello Sirisha,

      The LEADTOOLS SDK can accomplish this type of comparison. We have an online demo that showcases how we achieve this functionality. The UI is in HTML but the actual comparisons are done on the server in .NET. You can find the demo here:
      https://demo.leadtools.com/JavaScript/DocumentComparison/index.html

      There is a tutorial walkthrough for setting up and using this demo here:
      https://www.leadtools.com/help/sdk/v22/tutorials/html5-get-started-with-the-document-compare-demo.html

      The documentation for the Leadtools.Document.Compare can be found here:
      https://www.leadtools.com/help/sdk/v22/dh/dc/namespace.html

      Feel free to download our free 60-day evaluation from this link:
      https://www.leadtools.com/downloads

      And also feel free to reach out to our free technical support via email or chat:
      support@leadtools.com
      https://www.leadtools.com/support/chat

      Thanks,
      Zac