Split a PDF File into Multiple Files - Java

This tutorial shows three different techniques to individually save each page of a multipage PDF in a Java application using the LEADTOOLS SDK.

Overview  
Summary This tutorial covers how to split multipage PDF files in a Java Console application.
Completion Time 30 minutes
Eclipse Project Download tutorial project (3 KB)
Platform Java Application
IDE Eclipse
Development License Download LEADTOOLS
Try it in another language

Required Knowledge

Get familiar with the basic steps of creating a project by reviewing the Add References and Set a License tutorial, before working on the Split a PDF File into Multiple Files - Java tutorial.

Create the Project and Add the LEADTOOLS References

Start with a copy of the project created in the Add References and Set a License tutorial. If that project is unavailable, follow the steps in that tutorial to create it.

The references needed depend upon the purpose of the project. References can be added by local .jar files located at <INSTALL_DIR>\LEADTOOLS23\Bin\Java.

For this project, the following references are needed:

This tutorial uses LEADTOOLS Codec library support. For a complete list of which JAR files are required for your application, refer to Files to be Included with your Java Application

Set the License File

The License unlocks the features needed for the project. It must be set before any toolkit function is called. For details including tutorials for different platforms, refer to Setting a Runtime License.

There are two types of runtime licenses:

Note: Adding LEADTOOLS references and setting a license are covered in more detail in the Add References and Set a License tutorial.

Add the Split Pages Code

With the project created, the references added, and the license set, coding can begin.

Open the Main.java class in the Project Explorer. Add the following statements to the import block at the top.

Java
import java.io.IOException; 
import java.nio.file.*; 
import java.util.concurrent.ExecutorService; 
import java.util.concurrent.Executors; 
 
import leadtools.*; 
import leadtools.codecs.*; 
import leadtools.pdf.*; 
import leadtools.document.*; 
import leadtools.document.converter.*; 
import leadtools.document.writer.*; 
import leadtools.ocr.*; 

Add the code below to the run() method to create the split files directory and call the methods created in the sections below.

Java
private void run(String[] args) { 
   try { 
      Platform.setLibPath("C:\\LEADTOOLS23\\Bin\\CDLL\\x64"); 
      Platform.loadLibrary(LTLibrary.LEADTOOLS); 
      Platform.loadLibrary(LTLibrary.CODECS); 
      Platform.loadLibrary(LTLibrary.DOCUMENT_WRITER); 
      Platform.loadLibrary(LTLibrary.PDF); 
      Platform.loadLibrary(LTLibrary.OCR); 
          
      SetLicense(); 
          
      String multipageFile = "C:\\LEADTOOLS23\\Resources\\Images\\leadtools.pdf"; 
      String _splitDir = "C:\\LEADTOOLS23\\Resources\\Images\\Split PDFs"; 
      if(!Files.exists(Paths.get(_splitDir))) 
         Files.createDirectory(Paths.get(_splitDir)); 
      splitUsingRasterCodecs(multipageFile, _splitDir); 
      splitUsingPDFFile(multipageFile, _splitDir); 
      splitUsingLEADDocument(multipageFile, _splitDir);         
   } catch (Exception ex) { 
      System.err.println(ex.getMessage()); 
      ex.printStackTrace(); 
   } 
} 

Three different techniques for splitting the pages of a PDF file will be discussed below, each has its own advantages.

Method 1: Use RasterCodecs

In this approach, each page is loaded as a raster (bitmap) image, then saved as a raster PDF file. This is done using the RasterCodecs class.

The main advantage of this approach is code simplicity. It only takes a few lines of code, and the exact same code can be used for other multipage formats such as TIFF or GIF.

Create a new method in the _Main class named splitUsingRasterCodecs(String inputFile, String _directory). This method will be called inside the run() method, as shown above.

Java
void splitUsingRasterCodecs(String inputFile, String _directory) { 
   RasterCodecs codecs = new RasterCodecs(); 
   codecs.getOptions().getPdf().setInitialPath("C:\\LEADTOOLS23\\Bin\\CDLL\\x64"); 
   int totalPages = codecs.getTotalPages(inputFile); 
   System.out.println("SplitUsingRasterCodecs..\nTotal pages:" + totalPages + " Splitting pages:"); 
       
   for (int page = 1; page <= totalPages; page++) { 
      System.out.println(page + ".."); 
      String outputFilename = Paths.get(inputFile).toFile().getName(); 
      if(outputFilename.lastIndexOf('.') != -1) 
         outputFilename = outputFilename.substring(0, outputFilename.lastIndexOf('.')); 
      outputFilename = outputFilename + "_codecs_page" + page + ".pdf"; 
      String outputFile = _directory + "\\" + outputFilename; 
      RasterImage image = codecs.load(inputFile, page); 
      codecs.save(image, outputFile, RasterImageFormat.RAS_PDF_LZW, 0); 
      image.dispose(); 
   } 
   codecs.dispose(); 
} 

Method 2: Use PDFFile

In this approach, the PDFFile class is used, which is a dedicated class for the PDF format. This means the code cannot be used with other document or image formats.

The main advantage of this approach is that it preserves the contents of PDF pages since it does not convert searchable text to raster images. Additionally, in many cases it does not cause re-encoding of images that exist in the original PDF file, which improves performance and maintains image quality. The code is also very simple.

Create a new method in the _Main class named splitUsingPDFFile(String inputFile, String _directory). This method will be called inside the run() method, as shown above.

Java
void splitUsingPDFFile(String inputFile, String _directory) { 
   PDFFile pdfFile = new PDFFile(inputFile); 
   int totalPages = pdfFile.getPageCount(); 
   System.out.println("SplitUsingPDFFile..\nTotal pages:" + totalPages + " Splitting pages:"); 
   for (int page = 1; page <= totalPages; page++) { 
      System.out.println(page + ".."); 
      String outputFilename = Paths.get(inputFile).toFile().getName(); 
      if(outputFilename.lastIndexOf('.') != -1) 
         outputFilename = outputFilename.substring(0, outputFilename.lastIndexOf('.')); 
      outputFilename = outputFilename + "_pdffile_page" + page + ".pdf"; 
      String outputFile = _directory + "\\" + outputFilename; 
      pdfFile.extractPages(page, page, outputFile); 
   } 
} 

Method 3: Use LEADDocument

This approach is the most advanced of the three and it utilizes the LEADDocument and DocumentConverter classes.

Since these classes are versatile for use with different formats, similar code can be used for splitting many types of document files and outputting to different document and raster formats. For example, in the code below, simply changing DocumentFormat.PDF to become DocumentFormat.DOCX will split the file into Microsoft Word output pages instead of PDF pages. Additionally, these powerful classes produce optimized output files.

Configure Executor Service

Because document conversion jobs are asynchronous, a Java ExecutorService is required to be configured and assigned to the RasterDefaults class.

Define an ExecutorService field in the _Main class and add the code below in the run():

Java
private ExecutorService service; 
   private void run(String[] args) { 
      try { 
         Platform.setLibPath("C:\\LEADTOOLS23\\Bin\\CDLL\\x64"); 
         Platform.loadLibrary(LTLibrary.LEADTOOLS); 
         Platform.loadLibrary(LTLibrary.CODECS); 
         Platform.loadLibrary(LTLibrary.DOCUMENT_WRITER); 
         Platform.loadLibrary(LTLibrary.PDF); 
         Platform.loadLibrary(LTLibrary.OCR); 
          
         SetLicense(); 
          
         // Set ExecutorService in RasterDefaults for Document Converter Jobs 
         service = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors()); 
         RasterDefaults.setExecutorService(service); 
          
         String multipageFile = "C:\\LEADTOOLS23\\Resources\\Images\\leadtools.pdf"; 
         String _splitDir = "C:\\LEADTOOLS23\\Resources\\Images\\Split PDFs"; 
         if(!Files.exists(Paths.get(_splitDir))) 
            Files.createDirectory(Paths.get(_splitDir)); 
         splitUsingRasterCodecs(multipageFile, _splitDir); 
         splitUsingPDFFile(multipageFile, _splitDir); 
         splitUsingLEADDocument(multipageFile, _splitDir);         
      }  
      catch(Exception ex) { 
         System.err.println(ex.getMessage()); 
         ex.printStackTrace(); 
      } 
   } 

DocumentConvertor Code

Create a new method in the _Main class named splitUsingLEADDocument(string inputFile, string _directory). This method will be called inside the run() method, as shown above.

Java
void splitUsingLEADDocument(String inputFile, String _directory) { 
   DocumentWriter documentWriter = new DocumentWriter(); 
   // Optional: use documentWriter.getOptions() and documentWriter.setOptions() to modify PDF options  
   var createOptions = new CreateDocumentOptions(); 
       
   LEADDocument inputDocument = DocumentFactory.loadFromFile(inputFile, new LoadDocumentOptions()); 
   OcrEngine ocrEngine = OcrEngineManager.createEngine(OcrEngineType.LEAD); 
   ocrEngine.startup(null, null, null, "C:\\LEADTOOLS23\\Bin\\Common\\OcrLEADRuntime"); 
   System.out.println("SplitUsingLEADDocument..\nTotal pages:" + inputDocument.getPages().getOriginalPageCount() + " Splitting pages:"); 
   for(var inputPage : inputDocument.getPages()) { 
      LEADDocument pageDocument = DocumentFactory.create(createOptions); 
      pageDocument.setAutoDisposeDocuments(true); 
      pageDocument.setName("VirtualPage"); 
      pageDocument.getPages().add(inputPage); 
      DocumentConverter docConverter = new DocumentConverter(); 
      docConverter.setDocumentWriterInstance(documentWriter); 
      int page = inputDocument.getPages().indexOf(inputPage) + 1; // (+ 1) since index is zero-based 
      System.out.println(page + ".."); 
      var jobData = new DocumentConverterJobData(); 
      jobData.setDocument(inputPage); 
      String outputFilename = Paths.get(inputFile).toFile().getName(); 
      if(outputFilename.lastIndexOf('.') != -1) 
         outputFilename = outputFilename.substring(0, outputFilename.lastIndexOf('.')); 
      outputFilename = outputFilename + "_LeadDoc_page" + page + ".pdf"; 
      String outputFile = _directory + "\\" + outputFilename; 
      jobData.setOutputDocumentFileName(outputFile); 
      jobData.setDocumentFormat(DocumentFormat.PDF); 
      var job = docConverter.getJobs().createJob(jobData); 
      docConverter.getJobs().runJob(job); 
   } 
   System.out.println(); 
   ocrEngine.shutdown(); 
} 

Handling Streams

To handle the files using I/O streams, add a statement to the import block at the top to import the java.io.InputStream object.

Java
import java.io.IOException; 
import java.io.InputStream; 
import java.nio.file.*; 
import java.util.concurrent.ExecutorService; 
import java.util.concurrent.Executors; 
 
import leadtools.*; 
import leadtools.codecs.*; 
import leadtools.pdf.*; 
import leadtools.document.*; 
import leadtools.document.converter.*; 
import leadtools.document.writer.*; 
import leadtools.ocr.*; 

Replace the existing code in the run() method with the following:

Java
 private void run(String[] args) { 
        try { 
            Platform.setLibPath("C:\\LEADTOOLS23\\Bin\\CDLL\\x64"); 
            Platform.loadLibrary(LTLibrary.LEADTOOLS); 
            Platform.loadLibrary(LTLibrary.CODECS); 
            Platform.loadLibrary(LTLibrary.DOCUMENT_WRITER); 
            Platform.loadLibrary(LTLibrary.PDF); 
            Platform.loadLibrary(LTLibrary.OCR); 
 
            SetLicense(); 
 
            // Set ExecutorService in RasterDefaults for Document Converter Jobs 
            service = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors()); 
            RasterDefaults.setExecutorService(service); 
 
            String multipageFile = "C:\\LEADTOOLS23\\Resources\\Images\\leadtools.pdf"; 
            InputStream multipageInputStream = Files.newInputStream(Paths.get(multipageFile)); 
            LeadDynamicStream multipageLeadDynamicStream = new LeadDynamicStream(multipageInputStream, false); 
            splitUsingRasterCodecs(multipageLeadDynamicStream); 
            splitUsingLEADDocument(multipageLeadDynamicStream); 
 
        } catch (Exception ex) { 
            System.err.println(ex.getMessage()); 
            ex.printStackTrace(); 
        } 
    } 

Add the splitUsingRasterCodecs method overload which handles an ILeadStream object.

Java
void splitUsingRasterCodecs(ILeadStream inputStream) { 
   RasterCodecs codecs = new RasterCodecs(); 
   codecs.getOptions().getPdf().setInitialPath("C:\\LEADTOOLS23\\Bin\\CDLL\\x64"); 
   int totalPages = codecs.getTotalPages(inputStream); 
   System.out.println("SplitUsingRasterCodecs..\nTotal pages:" + totalPages + " Splitting pages:"); 
 
   for (int page = 1; page <= 5; page++) { 
      System.out.println(page + ".."); 
      RasterImage image = codecs.load(inputStream, page); 
      LeadDynamicStream leadDynamicStream = new LeadDynamicStream(); 
      codecs.save(image, leadDynamicStream, RasterImageFormat.RAS_PDF_LZW, 0); 
      // Use output Stream containing the split file before it is closed and freed for the next page 
      leadDynamicStream.close(); 
      leadDynamicStream.dispose(); 
      image.dispose(); 
   } 
   System.out.println(); 
   codecs.dispose(); 
} 

Add the splitUsingLEADDocument method overload which handles an ILeadStream object.

Java
void splitUsingLEADDocument(ILeadStream inputStream) { 
    DocumentWriter documentWriter = new DocumentWriter(); 
    // Optional: use documentWriter.getOptions() and documentWriter.setOptions() to modify PDF options 
    var createOptions = new CreateDocumentOptions(); 
 
    LEADDocument inputDocument = DocumentFactory.loadFromStream(inputStream, new LoadDocumentOptions()); 
    OcrEngine ocrEngine = OcrEngineManager.createEngine(OcrEngineType.LEAD); 
    ocrEngine.startup(null, null, null, "C:\\LEADTOOLS23\\Bin\\Common\\OcrLEADRuntime"); 
    System.out.println("SplitUsingLEADDocument..\nTotal pages:" + inputDocument.getPages().getOriginalPageCount() 
            + " Splitting pages:"); 
    for (var inputPage : inputDocument.getPages()) { 
        LEADDocument pageDocument = DocumentFactory.create(createOptions); 
        pageDocument.setAutoDisposeDocuments(true); 
        pageDocument.setName("VirtualPage"); 
        pageDocument.getPages().add(inputPage); 
        DocumentConverter docConverter = new DocumentConverter(); 
        docConverter.setDocumentWriterInstance(documentWriter); 
        int page = inputDocument.getPages().indexOf(inputPage) + 1; // (+ 1) since index is zero-based 
        System.out.println(page + ".."); 
        var jobData = new DocumentConverterJobData(); 
        jobData.setDocument(pageDocument); 
        jobData.setOutputDocumentStream(new LeadDynamicStream()); 
        jobData.setDocumentFormat(DocumentFormat.PDF); 
        jobData.setJobName("LeadDoc_page" + page); 
        var job = docConverter.getJobs().createJob(jobData); 
        Jobs_JobCompleted jobsCompleted = new Jobs_JobCompleted(); 
        docConverter.getJobs().addJobCompletedListener(jobsCompleted); 
        docConverter.getJobs().runJob(job); 
    } 
} 

Add the Jobs_JobCompleted event listener class that will handle the asynchronous jobs from the Document Converter above to access the output document stream.

Java
class Jobs_JobCompleted implements DocumentConverterJobEventListener { 
    public void onEvent(DocumentConverterJobEvent e) { 
        if (e.getOperation() == DocumentConverterJobOperation.COMPLETED) { 
            LeadDynamicStream outputDocumentStream = (LeadDynamicStream) e.getJob().getJobData().getOutputDocumentStream(); 
            // Use Output Document Stream containing the split file before it is closed and freed for the next page 
            outputDocumentStream.close(); 
            outputDocumentStream.dispose(); 
        } 
    } 
} 

Run the Project

Run the project by pressing Ctrl + F11, or by selecting Run -> Run.

If the steps were followed correctly, the application runs and creates new files. Each page of leadtools.pdf should be created as a separate PDF file in three different ways, with the page number appended to the name.

Wrap-up

This tutorial showed how to add the necessary references to load all the pages of a PDF file and split them into separate documents using various techniques.

See Also

Help Version 23.0.2024.3.11
Products | Support | Contact Us | Intellectual Property Notices
© 1991-2024 LEAD Technologies, Inc. All Rights Reserved.


Products | Support | Contact Us | Intellectual Property Notices
© 1991-2023 LEAD Technologies, Inc. All Rights Reserved.