In This Topic ▼

Split a PDF File into Multiple Files - Java

This tutorial shows three different techniques to individually save each page of a multipage PDF in a Java application using the LEADTOOLS SDK.

Overview
Summary	This tutorial covers how to split multipage PDF files in a Java Console application.
Completion Time	30 minutes
Eclipse Project	Download tutorial project (3 KB)
Platform	Java Application
IDE	Eclipse
Development License	Download LEADTOOLS
Try it in another language	C#: .NET Framework (Console) Java: Java

Required Knowledge

Get familiar with the basic steps of creating a project by reviewing the Add References and Set a License tutorial, before working on the Split a PDF File into Multiple Files - Java tutorial.

Create the Project and Add the LEADTOOLS References

Start with a copy of the project created in the Add References and Set a License tutorial. If that project is unavailable, follow the steps in that tutorial to create it.

The references needed depend upon the purpose of the project. References can be added by local .jar files located at <INSTALL_DIR>\LEADTOOLS22\Bin\Java.

For this project, the following references are needed:

leadtools.annotations.engine.jar
leadtools.caching.jar
leadtools.codecs.jar
leadtools.document.converter.jar
leadtools.document.jar
leadtools.document.pdf.jar
leadtools.document.writer.jar
leadtools.imageprocessing.core.jar
leadtools.jar
leadtools.ocr.jar
leadtools.pdf.jar
leadtools.svg.jar

This tutorial uses LEADTOOLS Codec library support. For a complete list of which JAR files are required for your application, refer to Files to be Included with your Java Application

Set the License File

The License unlocks the features needed for the project. It must be set before any toolkit function is called. For details including tutorials for different platforms, refer to Setting a Runtime License.

There are two types of runtime licenses:

Evaluation license, obtained at the time the evaluation toolkit is downloaded. It allows the toolkit to be evaluated.
Deployment license. If a Deployment license file and developer key are needed, refer to Obtaining a License.

Note

Adding LEADTOOLS references and setting a license are covered in more detail in the Add References and Set a License tutorial.

Add the Split Pages Code

With the project created, the references added, and the license set, coding can begin.

Open the Main.java class in the Project Explorer. Add the following statements to the import block at the top.

Java

// import block at the top 
import java.io.IOException; 
import java.nio.file.*; 
import java.util.concurrent.ExecutorService; 
import java.util.concurrent.Executors; 
 
import leadtools.*; 
import leadtools.codecs.*; 
import leadtools.pdf.*; 
import leadtools.document.*; 
import leadtools.document.converter.*; 
import leadtools.document.writer.*; 
import leadtools.ocr.*;

Add the code below to the run() method to create the split files directory and call the methods created in the sections below.

Java

private void run(String[] args) { 
   try { 
      Platform.setLibPath("C:\\LEADTOOLS22\\Bin\\CDLL\\x64"); 
      Platform.loadLibrary(LTLibrary.LEADTOOLS); 
      Platform.loadLibrary(LTLibrary.CODECS); 
      Platform.loadLibrary(LTLibrary.DOCUMENT_WRITER); 
      Platform.loadLibrary(LTLibrary.PDF); 
      Platform.loadLibrary(LTLibrary.OCR); 
          
      SetLicense(); 
          
      String multipageFile = "C:\\LEADTOOLS22\\Resources\\Images\\leadtools.pdf"; 
      String _splitDir = "C:\\LEADTOOLS22\\Resources\\Images\\Split PDFs"; 
      if(!Files.exists(Paths.get(_splitDir))) 
         Files.createDirectory(Paths.get(_splitDir)); 
      splitUsingRasterCodecs(multipageFile, _splitDir); 
      splitUsingPDFFile(multipageFile, _splitDir); 
      splitUsingLEADDocument(multipageFile, _splitDir);         
   }  
   catch(Exception ex) { 
      System.err.println(ex.getMessage()); 
      ex.printStackTrace(); 
   } 
}

Three different techniques for splitting the pages of a PDF file will be discussed below, each has its own advantages.

Method 1: Use RasterCodecs

In this approach, each page is loaded as a raster (bitmap) image, then saved as a raster PDF file. This is done using the RasterCodecs class.

The main advantage of this approach is code simplicity. It only takes a few lines of code, and the exact same code can be used for other multipage formats such as TIFF or GIF.

Create a new method in the _Main class named splitUsingRasterCodecs(String inputFile, String _directory). This method will be called inside the run() method, as shown above.

Java

void splitUsingRasterCodecs(String inputFile, String _directory) { 
   RasterCodecs codecs = new RasterCodecs(); 
   codecs.getOptions().getPdf().setInitialPath("C:\\LEADTOOLS22\\Bin\\CDLL\\x64"); 
   int totalPages = codecs.getTotalPages(inputFile); 
   System.out.println("SplitUsingRasterCodecs..\nTotal pages:" + totalPages + " Splitting pages:"); 
       
   for (int page = 1; page <= totalPages; page++)  
   { 
      System.out.println(page + ".."); 
      String outputFilename = Paths.get(inputFile).toFile().getName(); 
      if(outputFilename.lastIndexOf('.') != -1) 
         outputFilename = outputFilename.substring(0, outputFilename.lastIndexOf('.')); 
      outputFilename = outputFilename + "_codecs_page" + page + ".pdf"; 
      String outputFile = _directory + "\\" + outputFilename; 
      RasterImage image = codecs.load(inputFile, page); 
      codecs.save(image, outputFile, RasterImageFormat.RAS_PDF_LZW, 0); 
      image.dispose(); 
   } 
   codecs.dispose(); 
}

Method 2: Use PDFFile

In this approach, the PDFFile class is used, which is a dedicated class for the PDF format. This means the code cannot be used with other document or image formats.

The main advantage of this approach is that it preserves the contents of PDF pages since it does not convert searchable text to raster images. Additionally, in many cases it does not cause re-encoding of images that exist in the original PDF file, which improves performance and maintains image quality. The code is also very simple.

Create a new method in the _Main class named splitUsingPDFFile(String inputFile, String _directory). This method will be called inside the run() method, as shown above.

Java

void splitUsingPDFFile(String inputFile, String _directory) { 
   PDFFile pdfFile = new PDFFile(inputFile); 
   int totalPages = pdfFile.getPageCount(); 
   System.out.println("SplitUsingPDFFile..\nTotal pages:" + totalPages + " Splitting pages:"); 
   for (int page = 1; page <= totalPages; page++)  
   { 
      System.out.println(page + ".."); 
      String outputFilename = Paths.get(inputFile).toFile().getName(); 
      if(outputFilename.lastIndexOf('.') != -1) 
         outputFilename = outputFilename.substring(0, outputFilename.lastIndexOf('.')); 
      outputFilename = outputFilename + "_pdffile_page" + page + ".pdf"; 
      String outputFile = _directory + "\\" + outputFilename; 
      pdfFile.extractPages(page, page, outputFile); 
   } 
}

Method 3: Use LEADDocument

This approach is the most advanced of the three and it utilizes the LEADDocument and DocumentConverter classes.

Since these classes are versatile for use with different formats, similar code can be used for splitting many types of document files and outputting to different document and raster formats. For example, in the code below, simply changing DocumentFormat.PDF to become DocumentFormat.DOCX will split the file into Microsoft Word output pages instead of PDF pages. Additionally, these powerful classes produce optimized output files.

Configure Executor Service

Because document conversion jobs are asynchronous, a Java ExecutorService is required to be configured and assigned to the RasterDefaults class.

Define an ExecutorService field in the _Main class and add the code below in the run():

Java

private ExecutorService service; 
   private void run(String[] args) { 
      try { 
         Platform.setLibPath("C:\\LEADTOOLS22\\Bin\\CDLL\\x64"); 
         Platform.loadLibrary(LTLibrary.LEADTOOLS); 
         Platform.loadLibrary(LTLibrary.CODECS); 
         Platform.loadLibrary(LTLibrary.DOCUMENT_WRITER); 
         Platform.loadLibrary(LTLibrary.PDF); 
         Platform.loadLibrary(LTLibrary.OCR); 
          
         SetLicense(); 
          
         // Set ExecutorService in RasterDefaults for Document Converter Jobs 
         service = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors()); 
         RasterDefaults.setExecutorService(service); 
          
         String multipageFile = "C:\\LEADTOOLS22\\Resources\\Images\\leadtools.pdf"; 
         String _splitDir = "C:\\LEADTOOLS22\\Resources\\Images\\Split PDFs"; 
         if(!Files.exists(Paths.get(_splitDir))) 
            Files.createDirectory(Paths.get(_splitDir)); 
         splitUsingRasterCodecs(multipageFile, _splitDir); 
         splitUsingPDFFile(multipageFile, _splitDir); 
         splitUsingLEADDocument(multipageFile, _splitDir);         
      }  
      catch(Exception ex) { 
         System.err.println(ex.getMessage()); 
         ex.printStackTrace(); 
      } 
   }

DocumentConvertor Code

Create a new method in the _Main class named splitUsingLEADDocument(string inputFile, string _directory). This method will be called inside the run() method, as shown above.

Java

void splitUsingLEADDocument(String inputFile, String _directory) { 
   DocumentWriter documentWriter = new DocumentWriter(); 
   // Optional: use documentWriter.getOptions() and documentWriter.setOptions() to modify PDF options  
   var createOptions = new CreateDocumentOptions(); 
       
   LEADDocument inputDocument = DocumentFactory.loadFromFile(inputFile, new LoadDocumentOptions()); 
   OcrEngine ocrEngine = OcrEngineManager.createEngine(OcrEngineType.LEAD); 
   ocrEngine.startup(null, null, null, "C:\\LEADTOOLS22\\Bin\\Common\\OcrLEADRuntime"); 
   System.out.println("SplitUsingLEADDocument..\nTotal pages:" + inputDocument.getPages().getOriginalPageCount() + " Splitting pages:"); 
   for(var inputPage : inputDocument.getPages()) 
   { 
      LEADDocument pageDocument = DocumentFactory.create(createOptions); 
      pageDocument.setAutoDisposeDocuments(true); 
      pageDocument.setName("VirtualPage"); 
      pageDocument.getPages().add(inputPage); 
      DocumentConverter docConverter = new DocumentConverter(); 
      docConverter.setDocumentWriterInstance(documentWriter); 
      int page = inputDocument.getPages().indexOf(inputPage) + 1; // (+ 1) since index is zero-based 
      System.out.println(page + ".."); 
      var jobData = new DocumentConverterJobData(); 
      jobData.setDocument(inputPage); 
      String outputFilename = Paths.get(inputFile).toFile().getName(); 
      if(outputFilename.lastIndexOf('.') != -1) 
         outputFilename = outputFilename.substring(0, outputFilename.lastIndexOf('.')); 
      outputFilename = outputFilename + "_LeadDoc_page" + page + ".pdf"; 
      String outputFile = _directory + "\\" + outputFilename; 
      jobData.setOutputDocumentFileName(outputFile); 
      jobData.setDocumentFormat(DocumentFormat.PDF); 
      var job = docConverter.getJobs().createJob(jobData); 
      docConverter.getJobs().runJob(job); 
   } 
   System.out.println(); 
   ocrEngine.shutdown(); 
}

Handling Streams

To handle the files using I/O streams, add a statement to the import block at the top to import the java.io.InputStream object.

Java

import java.io.IOException; 
import java.io.InputStream; 
import java.nio.file.*; 
import java.util.concurrent.ExecutorService; 
import java.util.concurrent.Executors; 
 
import leadtools.*; 
import leadtools.codecs.*; 
import leadtools.pdf.*; 
import leadtools.document.*; 
import leadtools.document.converter.*; 
import leadtools.document.writer.*; 
import leadtools.ocr.*;

Replace the existing code in the run() method with the following:

Java

 private void run(String[] args) { 
        try { 
            Platform.setLibPath("C:\\LEADTOOLS22\\Bin\\CDLL\\x64"); 
            Platform.loadLibrary(LTLibrary.LEADTOOLS); 
            Platform.loadLibrary(LTLibrary.CODECS); 
            Platform.loadLibrary(LTLibrary.DOCUMENT_WRITER); 
            Platform.loadLibrary(LTLibrary.PDF); 
            Platform.loadLibrary(LTLibrary.OCR); 
 
            SetLicense(); 
 
            // Set ExecutorService in RasterDefaults for Document Converter Jobs 
            service = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors()); 
            RasterDefaults.setExecutorService(service); 
 
            String multipageFile = "C:\\LEADTOOLS22\\Resources\\Images\\leadtools.pdf"; 
            InputStream multipageInputStream = Files.newInputStream(Paths.get(multipageFile)); 
            LeadDynamicStream multipageLeadDynamicStream = new LeadDynamicStream(multipageInputStream, false); 
            splitUsingRasterCodecs(multipageLeadDynamicStream); 
            splitUsingLEADDocument(multipageLeadDynamicStream); 
 
        } catch (Exception ex) { 
            System.err.println(ex.getMessage()); 
            ex.printStackTrace(); 
        } 
    }

Add the splitUsingRasterCodecs method overload which handles an ILeadStream object.

Java

void splitUsingRasterCodecs(ILeadStream inputStream) { 
   RasterCodecs codecs = new RasterCodecs(); 
   codecs.getOptions().getPdf().setInitialPath("C:\\LEADTOOLS22\\Bin\\CDLL\\x64"); 
   int totalPages = codecs.getTotalPages(inputStream); 
   System.out.println("SplitUsingRasterCodecs..\nTotal pages:" + totalPages + " Splitting pages:"); 
 
   for (int page = 1; page <= 5; page++) { 
      System.out.println(page + ".."); 
      RasterImage image = codecs.load(inputStream, page); 
      LeadDynamicStream leadDynamicStream = new LeadDynamicStream(); 
      codecs.save(image, leadDynamicStream, RasterImageFormat.RAS_PDF_LZW, 0); 
      // Use output Stream containing the split file before it is closed and freed for the next page 
      leadDynamicStream.close(); 
      leadDynamicStream.dispose(); 
      image.dispose(); 
   } 
   System.out.println(); 
   codecs.dispose(); 
}

Add the splitUsingLEADDocument method overload which handles an ILeadStream object.

Java

void splitUsingLEADDocument(ILeadStream inputStream) { 
    DocumentWriter documentWriter = new DocumentWriter(); 
    // Optional: use documentWriter.getOptions() and documentWriter.setOptions() to modify PDF options 
    var createOptions = new CreateDocumentOptions(); 
 
    LEADDocument inputDocument = DocumentFactory.loadFromStream(inputStream, new LoadDocumentOptions()); 
    OcrEngine ocrEngine = OcrEngineManager.createEngine(OcrEngineType.LEAD); 
    ocrEngine.startup(null, null, null, "C:\\LEADTOOLS22\\Bin\\Common\\OcrLEADRuntime"); 
    System.out.println("SplitUsingLEADDocument..\nTotal pages:" + inputDocument.getPages().getOriginalPageCount() 
            + " Splitting pages:"); 
    for (var inputPage : inputDocument.getPages()) { 
        LEADDocument pageDocument = DocumentFactory.create(createOptions); 
        pageDocument.setAutoDisposeDocuments(true); 
        pageDocument.setName("VirtualPage"); 
        pageDocument.getPages().add(inputPage); 
        DocumentConverter docConverter = new DocumentConverter(); 
        docConverter.setDocumentWriterInstance(documentWriter); 
        int page = inputDocument.getPages().indexOf(inputPage) + 1; // (+ 1) since index is zero-based 
        System.out.println(page + ".."); 
        var jobData = new DocumentConverterJobData(); 
        jobData.setDocument(pageDocument); 
        jobData.setOutputDocumentStream(new LeadDynamicStream()); 
        jobData.setDocumentFormat(DocumentFormat.PDF); 
        jobData.setJobName("LeadDoc_page" + page); 
        var job = docConverter.getJobs().createJob(jobData); 
        Jobs_JobCompleted jobsCompleted = new Jobs_JobCompleted(); 
        docConverter.getJobs().addJobCompletedListener(jobsCompleted); 
        docConverter.getJobs().runJob(job); 
    } 
}

Add the Jobs_JobCompleted event listener class that will handle the asynchronous jobs from the Document Converter above to access the output document stream.

Java

class Jobs_JobCompleted implements DocumentConverterJobEventListener { 
    public void onEvent(DocumentConverterJobEvent e) { 
        if (e.getOperation() == DocumentConverterJobOperation.COMPLETED) { 
            LeadDynamicStream outputDocumentStream = (LeadDynamicStream) e.getJob().getJobData().getOutputDocumentStream(); 
            // Use Output Document Stream containing the split file before it is closed and freed for the next page 
            outputDocumentStream.close(); 
            outputDocumentStream.dispose(); 
        } 
    } 
}

Run the Project

Run the project by pressing Ctrl + F11, or by selecting Run -> Run.

If the steps were followed correctly, the application runs and creates new files. Each page of leadtools.pdf should be created as a separate PDF file in three different ways, with the page number appended to the name.

Wrap-up

This tutorial showed how to add the necessary references to load all the pages of a PDF file and split them into separate documents using various techniques.

Split a PDF File into Multiple Files - Java

Required Knowledge

Create the Project and Add the LEADTOOLS References

Set the License File

Add the Split Pages Code

Method 1: Use RasterCodecs

Method 2: Use PDFFile

Method 3: Use LEADDocument

Configure Executor Service

DocumentConvertor Code

Handling Streams

Run the Project

Wrap-up

See Also