Split a PDF File into Multiple Files - C# .NET 6

This tutorial shows three different techniques to individually save each page of a multipage PDF in a C# Windows Console application using the LEADTOOLS SDK.

Overview  
Summary This tutorial covers how to split multipage PDF files in a C# .NET 6 application.
Completion Time 30 minutes
Visual Studio Project Download tutorial project (2 KB)
Platform C# .NET 6 Console Application
IDE Visual Studio 2022
Runtime Target .NET 6 or higher
Development License Download LEADTOOLS
Try it in another language
  • C#: .NET 6+ (Console)

  • Java: Java

Required Knowledge

Get familiar with the basic steps of creating a project by reviewing the Add References and Set a License tutorial, before working on the Split a PDF File into Multiple Files - Console C# tutorial.

Create the Project and Add LEADTOOLS References

Start with a copy of the project created in the Add References and Set a License tutorial. If that project is unavailable, follow the steps in that tutorial to create it.

The references needed depend upon the purpose of the project.

This tutorial requires the following NuGet package:

Set the License File

The License unlocks the features needed for the project. It must be set before any toolkit function is called. For details, including tutorials for different platforms, refer to Setting a Runtime License.

There are two types of runtime licenses:

Add the Split Pages Code

With the project created, the references added, and the license set, coding can begin.

In the Solution Explorer, open Program.cs. Add the following statements to the using block at the top of Program.cs:

C#
using Leadtools; 
using Leadtools.Codecs; 
using Leadtools.Pdf; 
using Leadtools.Document; 
using Leadtools.Document.Converter; 
using Leadtools.Document.Writer; 
using Leadtools.Ocr; 

Add the code below to the Main() method to create the split files directory and call the methods created in the sections below.

C#
static void Main(string[] args) 
{ 
   try 
   { 
      InitLEAD(); 
 
      string multipageFile = @"C:\LEADTOOLS23\Resources\Images\leadtools.pdf"; 
      string _splitDir = @"C:\LEADTOOLS23\Resources\Images\Split PDFs"; 
      if (!Directory.Exists(_splitDir)) 
      { 
         Directory.CreateDirectory(_splitDir); 
      } 
      SplitUsingRasterCodecs(multipageFile, _splitDir); 
      SplitUsingPDFFile(multipageFile, _splitDir); 
      SplitUsingLEADDocument(multipageFile, _splitDir); 
   } 
   catch (Exception ex) 
   { 
      Console.WriteLine(ex.ToString()); 
   } 
   Console.WriteLine("Press any key to exit."); 
   Console.ReadKey(true); 
} 

Three different techniques for splitting the pages of a PDF file will be discussed below, each has its own advantages.

Method 1: Use RasterCodecs

In this approach, each page is loaded as a raster (bitmap) image, then saved as a raster PDF file. This is done using the RasterCodecs class.

The main advantage of this approach is code simplicity. It only takes a few lines of code, and the exact same code can be used for other multipage formats such as TIFF or GIF.

Create a new method in the Program class named SplitUsingRasterCodecs(string inputFile, string _directory). This method will be called inside the Main() method, as shown above.

C#
static void SplitUsingRasterCodecs(string inputFile, string _directory) 
{ 
   using RasterCodecs codecs = new(); 
   codecs.Options.Pdf.InitialPath = @"C:\LEADTOOLS23\Bin\CDLL\x64"; 
   int totalPages = codecs.GetTotalPages(inputFile); 
   Console.Write($"SplitUsingRasterCodecs..\nTotal pages: {totalPages}, Splitting pages: "); 
 
   for (int page = 1; page <= totalPages; page++) 
   { 
      Console.Write($"{page}.. "); 
      string outputFileName = $"{Path.GetFileNameWithoutExtension(inputFile)}_codecs_page{page}.pdf"; 
      string outputFile = Path.Combine(_directory, outputFileName); 
      using RasterImage image = codecs.Load(inputFile, page); 
      codecs.Save(image, outputFile, RasterImageFormat.RasPdfLzw, 0); 
   } 
   Console.WriteLine(); 
 
} 

Method 2: Use PDFFile

In this approach, the PDFFile class is used, which is a dedicated class for the PDF format. This means the code cannot be used with other document or image formats.

The main advantage of this approach is that it preserves the contents of PDF pages since it does not convert searchable text to raster images. Additionally, in many cases it does not cause re-encoding of images that exist in the original PDF file, which improves performance and maintains image quality. The code is also very simple.

Create a new method in the Program class named SplitUsingPDFFile(string inputFile, string _directory). This method will be called inside the Main() method, as shown above.

C#
static void SplitUsingPDFFile(string inputFile, string _directory) 
{ 
   PDFFile pdfFile = new(inputFile); 
   int totalPages = pdfFile.GetPageCount(); 
   Console.Write($"SplitUsingPDFFile..\nTotal pages: {totalPages}, Splitting pages: "); 
   for (int page = 1; page <= totalPages; page++) 
   { 
      Console.Write($"{page}.. "); 
      string outputFileName = $"{Path.GetFileNameWithoutExtension(inputFile)}_pdfFile_page{page}.pdf"; 
      string outputFile = Path.Combine(_directory, outputFileName); 
      pdfFile.ExtractPages(page, page, outputFile); 
   } 
   Console.WriteLine(); 
} 

Method 3: Use LEADDocument

This approach is the most advanced of the three and it utilizes the LEADDocument and DocumentConverter classes.

Since these classes are versatile for use with different formats, similar code can be used for splitting many types of document files and outputting to different document and raster formats. For example, in the code below, simply changing DocumentFormat.Pdf to become DocumentFormat.Docx will split the file into Microsoft Word output pages instead of PDF pages. Additionally, these powerful classes produce optimized output files.

Create a new method in the Program class named SplitUsingPDFFile(string inputFile, string _directory). This method will be called inside the Main() method, as shown above.

C#
static void SplitUsingLEADDocument(string inputFile, string _directory) 
{ 
   DocumentWriter documentWriter = new(); 
   // Optional: use documentWriter.GetOptions() and documentWriter.SetOptions() to modify PDF options 
   var createOptions = new CreateDocumentOptions(); 
 
   LEADDocument inputDocument = DocumentFactory.LoadFromFile(inputFile, new LoadDocumentOptions()); 
   IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD); 
   ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS23\Bin\Common\OcrLEADRuntime"); 
   Console.Write($"SplitUsingLEADDocument..\nTotal pages: {inputDocument.Pages.Count}, Splitting pages: "); 
   foreach (var inputPage in inputDocument.Pages) 
   { 
      LEADDocument pageDocument = DocumentFactory.Create(createOptions); 
      pageDocument.AutoDisposeDocuments = true; 
      pageDocument.Name = "VirtualPage"; 
      pageDocument.Pages.Add(inputPage); 
      DocumentConverter docConverter = new(); 
      docConverter.SetOcrEngineInstance(ocrEngine, false); 
      docConverter.SetDocumentWriterInstance(documentWriter); 
      int page = inputDocument.Pages.IndexOf(inputPage) + 1; // (+ 1) since index is zero-based 
      Console.Write($"{page}.. "); 
      var jobData = new DocumentConverterJobData 
      { 
         Document = pageDocument, 
         OutputDocumentFileName = Path.Combine(_directory, $"{Path.GetFileNameWithoutExtension(inputFile)}_LeadDoc_page{page}.pdf"), 
         DocumentFormat = DocumentFormat.Pdf 
      }; 
      var job = docConverter.Jobs.CreateJob(jobData); 
      docConverter.Jobs.RunJob(job); 
   } 
   Console.WriteLine(""); 
   ocrEngine.Shutdown(); 
} 

Handling Streams

To handle the files using MemoryStream, modify the two methods SplitUsingRasterCodecs and SplitUsingLEADDocumentreplace, and modify the code that calls them from the Main() method as follows:

C#
// The following code goes inside the Main method 
// Note that the PDFFile class does not accept stream input 
byte[] multipageData = File.ReadAllBytes(multipageFile); 
using MemoryStream multipageStream = new MemoryStream(multipageData); 
SplitUsingRasterCodecs(multipageStream); 
SplitUsingLEADDocument(multipageStream); 
 
static void SplitUsingRasterCodecs(Stream inputStream) 
{ 
   using RasterCodecs codecs = new(); 
   codecs.Options.Pdf.InitialPath = @"C:\LEADTOOLS23\Bin\CDLL\x64"; 
   int totalPages = codecs.GetTotalPages(inputStream); 
   Console.Write($"SplitUsingRasterCodecs..\nTotal pages: {totalPages}, Splitting pages: "); 
 
   for (int page = 1; page <= totalPages; page++) 
   { 
      Console.Write($"{page}.. "); 
      using RasterImage image = codecs.Load(inputStream, page); 
      using MemoryStream outputStream = new(); 
      codecs.Save(image, outputStream, RasterImageFormat.RasPdfLzw, 0); 
      // Use output Memory Stream containing the split file before it is closed and freed for the next page 
   } 
   Console.WriteLine(); 
} 
 
static void SplitUsingLEADDocument(Stream inputStream) 
{ 
   DocumentWriter documentWriter = new(); 
   // Optional: use documentWriter.GetOptions() and documentWriter.SetOptions() to modify PDF options 
   var createOptions = new CreateDocumentOptions(); 
 
   LEADDocument inputDocument = DocumentFactory.LoadFromStream(inputStream, new LoadDocumentOptions()); 
   IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD); 
   ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS23\Bin\Common\OcrLEADRuntime"); 
   Console.Write($"SplitUsingLEADDocument..\nTotal pages: {inputDocument.Pages.Count}, Splitting pages: "); 
   foreach (var inputPage in inputDocument.Pages) 
   { 
      LEADDocument pageDocument = DocumentFactory.Create(createOptions); 
      pageDocument.AutoDisposeDocuments = true; 
      pageDocument.Name = "VirtualPage"; 
      pageDocument.Pages.Add(inputPage); 
      DocumentConverter docConverter = new(); 
      docConverter.SetOcrEngineInstance(ocrEngine, false); 
      docConverter.SetDocumentWriterInstance(documentWriter); 
      int page = inputDocument.Pages.IndexOf(inputPage) + 1; // (+ 1) since index is zero-based 
      Console.Write($"{page}.. "); 
      var jobData = new DocumentConverterJobData 
      { 
         Document = pageDocument, 
         OutputDocumentStream = new MemoryStream(), 
         DocumentFormat = DocumentFormat.Pdf, 
         JobName =  "LeadDoc_page" + page 
      }; 
      var job = docConverter.Jobs.CreateJob(jobData); 
      docConverter.Jobs.JobCompleted += Jobs_JobCompleted; 
      docConverter.Jobs.RunJob(job); 
   } 
   Console.WriteLine(""); 
   ocrEngine.Shutdown(); 
} 
 
private static void Jobs_JobCompleted(object sender, DocumentConverterJobEventArgs e) 
{ 
   MemoryStream outputStream = e.Job.JobData.OutputDocumentStream as MemoryStream; 
   // Each output stream will contain a split page after conversion job is complete 
   // Use stream here before freeing and closing 
   outputStream.Dispose(); 
   outputStream.Close(); 
} 

Run the Project

Run the project by pressing F5, or by selecting Debug -> Start Debugging.

If the steps were followed correctly, the application runs and creates new files. Each page of leadtools.pdf should be created as a separate PDF file in three different ways, with the page number appended to the name.

Wrap-up

This tutorial showed how to add the necessary references to load all the pages of a PDF file and split them into separate documents using various techniques.

See Also

Help Version 23.0.2024.4.23
Products | Support | Contact Us | Intellectual Property Notices
© 1991-2024 LEAD Technologies, Inc. All Rights Reserved.


Products | Support | Contact Us | Intellectual Property Notices
© 1991-2023 LEAD Technologies, Inc. All Rights Reserved.