PowerShell Script: Compare DOC and PDF Files

It is common to ask other departments for feedback on content to make sure you get mulitple perspectives on the topic. I cannot count the number of times I have sent a document to another department only to get it back with Tracking Changes disabled. 😡 When that happens, you can either manually check each and every word for a change like a caveman, or you can use a tool to show the differences. Fortunately, I have LEADTOOLS and PowerShell in my arsenal of tools.

With LEADTOOLS you can compare two DOCX files, two PDF files, or even a PDF to a DOCX. By adding OCR to the mix, you can even compare a photo of a document to a Word document. The following PowerShell cmdlet compares the contents of two files:

PowerShell 7 script to compare two documents, including PDF. It displays the change report in the default web browser.

function Compare-LtDocuments {
<#
.SYNOPSIS
    Compares two documents and displays differences in web browser

.DESCRIPTION
    Compares two documents, saves differences as markdown, and displays the Markdown in default web browser. A document can be Word, PDF, TXT, etc...

.PARAMETER DocumentPath1
    Document to compare

.PARAMETER DocumentPath2
    Document to compare

.EXAMPLE
    Compare-LtDocuments -DocumentPath1 "c:\temp\a.pdf" -DocumentPath2 "c:\temp\b.pdf"

.INPUTS
    String, String

.OUTPUTS
    Bool

.NOTES
    Author:  LEAD Technologies, Inc.
    Website: https://www.leadtools.com
    Twitter: @leadtools
#>
    [CmdletBinding()]
    param(
        [Parameter(Mandatory)]
        [string]$DocumentPath1,

        [Parameter(Mandatory)]
        [string]$DocumentPath2
    )

    if( -not(Test-Path -Path $DocumentPath1 -PathType Leaf) ) {
        Write-Error "File does not exist."
        return $false
    }

    if( -not(Test-Path -Path $DocumentPath2 -PathType Leaf) ) {
        Write-Error "File does not exist."
        return $false
    }

    $documentLoadOptions = New-Object -TypeName Leadtools.Document.LoadDocumentOptions
    $documentsList = New-Object -TypeName System.Collections.Generic.List[Leadtools.Document.LEADDocument]
    $documentsList.Add([Leadtools.Document.DocumentFactory]::LoadFromFile($DocumentPath1, $documentLoadOptions))
    $documentsList.Add([Leadtools.Document.DocumentFactory]::LoadFromFile($DocumentPath2, $documentLoadOptions))

    $mdReportOptions = New-Object -TypeName Leadtools.Document.Compare.MarkdownReportOptions
    $mdReportOptions.BaseColor = "#777"
    $mdReportOptions.ReportHeaders.Add("Example Report Header")
    $mdReportOptions.ReportFooters.Add("Example Report Footer")

    ForEach ($document in $documentsList){
        $mdReportOptions.DocumentNames.Add($document.Name)
    }

    $documentComparer = New-Object -TypeName Leadtools.Document.Compare.DocumentComparer
    $diffs = $documentComparer.CompareDocument($documentsList);

    $stream = New-Object -TypeName System.IO.MemoryStream
    $diffs.GenerateMarkdownReport($stream, $mdReportOptions)
    $stream.Position = 0
    $currentPath = Get-Location
    Do {
        $randomFileName =  [System.IO.Path]::GetRandomFileName() + ".md"
        $randomFilePath = Join-Path -Path $currentPath -ChildPath $randomFileName
    }
    Until(!(Test-Path $randomFilePath))

    [System.IO.File]::WriteAllBytes($randomFilePath, $stream.ToArray())

    $stream.Dispose()

    Show-Markdown -LiteralPath $randomFilePath -UseBrowser
}

See for yourself – Free evaluation

Download the LEADTOOLS SDK for free. It’s fully-functional for 60 days and comes with free chat and email support.

Stay tuned for more PowerShell script samples

Did you see our previous post, “PowerShell Script: Merge PDF Files”? Stay tuned for more PDF examples to see how LEADTOOLS fits into any workflow.

Need help in the meantime?

Contact our support team for free technical support! For pricing or licensing questions, you can contact our sales team via email or call us at +1-704-332-5532.

This entry was posted in PDF and tagged , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *