Merge Documents with LEADDocument - Python

This tutorial shows how to create a new LEADDocument that loads and merges documents from a directory or to a stream in a Python application using the LEADTOOLS SDK.

Overview  
Summary This tutorial covers how to merge documents using LEADDocument in a Python application
Completion Time 30 minutes
Visual Studio Project Download tutorial project (2 KB)
Platform Python Console Application
IDE Visual Studio 2022
Runtime Target Python 3.10 or higher
Development License Download LEADTOOLS
Try it in another language

Required Knowledge

Before working on the Merge Document with LEADDocument - Python tutorial, you need to be familiar with the basic steps of creating a project by reviewing the Add References and Set a License tutorial.

Create the Project and Add the LEADTOOLS References

Start with a copy of the project created in the Add References and Set a License for Python topic.

If you do not have that project, follow the steps in the relevant tutorial to create it.

This tutorial requires the following .NET DLLs:

For a complete list of which Codecs DLLs are required for specific formats, refer to File Format Support.

Set the License File

The License unlocks the features needed for the project. It must be set before any toolkit function is called. For details, including tutorials for different platforms, refer to Setting a Runtime License.

There are two types of runtime licenses:

Create LEADDocument and Merge Documents Code

With the project created, the references added, and the license set, coding can begin.

In the Solution Explorer, open Project-Name.py and place the following references below the "Add references to LEADTOOLS" comment

# Add references to LEADTOOLS 
from leadtools import LibraryLoader 
LibraryLoader.add_reference("Leadtools") 
from Leadtools import * 
LibraryLoader.add_reference("Leadtools.Ocr") 
from Leadtools.Ocr import * 
LibraryLoader.add_reference("Leadtools.Document") 
from Leadtools.Document import * 
LibraryLoader.add_reference("Leadtools.Document.Writer") 
from Leadtools.Document.Writer import * 
LibraryLoader.add_reference("Leadtools.Document.Converter") 
from Leadtools.Document.Converter import * 
 
from System.IO import * 

Add the below code to the main() method to start the IOcrEngine, merge PDF documents, and gather the stream from the created PDF document.

def main(): 
 
    Support.set_license(os.path.join(DemosTools.get_root(), "C:/LEADTOOLS23/Support/Common/License")) 
 
    folder = r"C:\LEADTOOLS23\Resources\Images" 
    ocr_engine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD) 
    ocr_engine.Startup(None, None, None, r"C:\LEADTOOLS23\Bin\Common\OcrLEADRuntime") 
    ms = merge_pdf_files(folder, ocr_engine) 
    ms.Position = 0 
    File.WriteAllBytes(r"C:\LEADTOOLS23\Resources\Images\merged.pdf", ms.GetBuffer()) 

Add a new method called merge_pdf_files(dir, ocr_engine) to return the stream containing the merged PDF document.

def merge_pdf_files(dir, ocr_engine): 
 
    document_writer = DocumentWriter() 
 
    # Get the current PDF options 
    pdf_options = document_writer.GetOptions(DocumentFormat.Pdf) 
 
    # Set our options 
    document_writer.SetOptions(DocumentFormat.Pdf, pdf_options) 
    pdf_options.ImageOverText = True 
    output_stream = MemoryStream() 
    create_options = CreateDocumentOptions() 
    virtual_document = DocumentFactory.Create(create_options) 
    virtual_document.AutoDisposeDocuments = True 
    virtual_document.Name = "Virtual" 
 
    files = Directory.GetFiles(dir, "*.pdf") 
 
    options = LoadDocumentOptions() 
 
    for file in files: 
        child_document = DocumentFactory.LoadFromFile(file, options) 
        virtual_document.Pages.AddRange(child_document.Pages) 
     
    # Convert virtual_document using the DocumentConverter to finalize the document and gather the stream 
    document_converter = DocumentConverter() 
    document_converter.SetOcrEngineInstance(ocr_engine, False) 
    document_converter.SetDocumentWriterInstance(document_writer) 
    job_data = DocumentConverterJobData() 
    job_data.Document = virtual_document 
    job_data.OutputDocumentStream = output_stream 
    job_data.DocumentFormat = DocumentFormat.Pdf 
 
    job = document_converter.Jobs.CreateJob(job_data) 
    document_converter.Jobs.RunJob(job) 
    if (job.Status == DocumentConverterJobStatus.Success): 
        print("Success!") 
    else: 
        print(f"{job.Status} Errors") 
        for error in job.Errors: 
            print(f"{error.Operation}, at {error.InputDocumentPageNumber}: {error.Error.Message}") 
    
    return output_stream 

Note: Adding pages from child documents to a Virtual Document is not finalized. This means that the source pages from the child documents still only exist in the location they were originally loaded from. The Virtual Document only contains the information for where each page exists as well as other metadata about the page and file.

  • The Virtual Document can be displayed in a Document Viewer.

  • Or, the Virtual Document can be finalized and a new Document can be created by using the Document Converter as illustrated in this tutorial. This creates a legitimate document that contains copies of the source pages in its own document structure.

Run the Project

Run the project by pressing F5, or by selecting Debug -> Start Debugging.

If the steps were followed correctly, the application runs and creates a new virtual LEADDocument. The application then takes each PDF file from a given directory, and adds each PDF file to the virtual LEADDocument. Lastly, it "finalizes" the virtual document by sending it to the Document Converter.

Wrap-up

This tutorial covered how to use the LEADDocument and DocumentConverter classes.

See Also

Help Version 23.0.2024.4.23
Products | Support | Contact Us | Intellectual Property Notices
© 1991-2024 LEAD Technologies, Inc. All Rights Reserved.


Products | Support | Contact Us | Intellectual Property Notices
© 1991-2023 LEAD Technologies, Inc. All Rights Reserved.