Automatically Annotate and Bookmark PDF Files using LEADTOOLS

Whether you’re creating contracts for business-to-business agreements or putting together a searchable PDF repository, finding a fast and efficient PDF viewer is critical. Tasks that are usually tedious and time-consuming with all the other PDF Viewers on the market (yes, even Adobe!) are easily automated using the LEADTOOLS Document Converter. This includes the ability to extract text using OCR, annotate, bookmark, and add searchability to PDFs- all without the need for any manual inputs!

In this post we show how to use LEADTOOLS to add automation to your PDF files. To start, we just need to create a few simple methods using .NET 6 and Microsoft Worker Service.

Worker Method

The Worker method is where we will set up the basic necessities needed for the project to run. First we call the SetLicense method to set our LEADTOOLS SDK license. Then we create and start up the OCR Engine and Document Converter. The last step is to create the directories where the worker will look for files that need changes as well as the output location for the converted files.


public Worker(ILogger<Worker>logger)
{
	_logger = logger;
	SetLicense();
	OcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD);
	OcrEngine.Startup(null, null, null, null);
	
	docConverter = new DocumentConverter();
	docConverter.SetOcrEngineInstance(OcrEngine, false);
	docConverter.SetAnnRenderingEngineInstance(new AnnDrawRenderingEngine());
	docConverter.SetDocumentWriterInstance(new DocumentWriter());
	
	PdfDocumentOptions pdfOpts = docConverter.DocumentWriterInstance.GetOptions(DocumentFormat.Pdf) as PdfDocumentOptions;
            pdfOpts.ImageOverText = true;
            	docConverter.DocumentWriterInstance.SetOptions(DocumentFormat.Pdf, pdfOpts);

	if (!Directory.Exists(dropDirectory))
		Directory.CreateDirectory(dropDirectory);
	
	if (!Directory.Exists(outputDirectory))
		Directory.CreateDirectory(outputDirectory);
}

ExecuteAsync

ExecuteAsync is our actual task method and very simple; it setups the FileSystemWatcher that ‘watches’ a specified folder for whenever any new files are added into it. If any are detected, it will fire the Converter method with the parameter of the new file.


protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
	FileSystemWatcher watcher = new FileSystemWatcher(dropDirectory);
	watcher.Created += Watcher_Created;
            watcher.EnableRaisingEvents = true;
        }

Watcher_Created is the method that runs whenever the FileSystemWatcher spots a new file that is copied or moved into the specified folder. This will continuously run the IsFileLocked method until it returns true and then will run the Converter method.


private void Watcher_Created(object sender, FileSystemEventArgs e)
        {
            if (e.ChangeType == WatcherChangeTypes.Created)
            {
                var file = new FileInfo(e.FullPath);
                while (IsFileLocked(file)) { }
                Converter(e.FullPath);
            }
        }

The IsFileLocked method checks to see if the newly added file in the specified watched folder has finished copying/moving. This method is ran until it returns true, which symbolizes that the file has finished copying/moving.


private bool IsFileLocked(FileInfo file)
        {
            FileStream stream = null;
            try
            {
                stream = file.Open(FileMode.Open, FileAccess.ReadWrite, FileShare.None);
            }
            catch (IOException)
            {
                //the file is unavailable because it is:
                //still being written to
                //or being processed by another thread
                //or does not exist (has already been processed)
                return true;
            }
            finally
            {
                if (stream != null)
                    stream.Close();
            }
            //file is not locked

Converter Method

With only four lines of code, the Converter method creates a new file path with the current date and time and then sends the array of files and the path for the new file to ConvertDocument. Lastly, it goes through and cleans out the folders that ExecuteAsync method will look in for conversion as a way to clear out the queue.


public static void Converter(string[] filesToConvert)
{
	string outputPdfFile = Path.Combine(outputDirectory, $@"{DateTime.Now:yyyy.MM.dd.hh.mm.ss}.pdf");

	ConvertDocument(filesToConvert, outputPdfFile);

	foreach (var file in filesToConvert)
	File.Delete(file);
}

Document Converter

ConvertDocument creates a folder for the cache that will be needed to load and set the annotations for each page, creates the LEADDocument (encapsulates a multipage document with support for raster and SVG images, annotations, bookmarks, and text data) that will be used for converting and merging the files into, and runs the DocumentConverter that will make the text searchable and apply the annotations/bookmarks to the newly created .PDF file.


static void ConvertDocument(string[] filesToConvert, string outputFile)
{
	var cache = GetCache();

	var docOptions = new CreateDocumentOptions() { Cache = cache };

	var leadDocument = DocumentFactory.Create(docOptions);

	var options = new LoadDocumentOptions() { Cache = cache};

	foreach (var file in filesToConvert)
	{
		var doc = DocumentFactory.LoadFromFile(file, options);
		leadDocument.Pages.AddRange(doc.Pages);
	}

	leadDocument.AutoDeleteFromCache = false;

	leadDocument.Text.OcrEngine = OcrEngine;

	AddFooterAnnotation(leadDocument);

	var job = docConverter.Jobs.CreateJob(new DocumentConverterJobData
	{
		Document = leadDocument,
		DocumentFormat = DocumentFormat.Pdf,
		AnnotationsMode = DocumentConverterAnnotationsMode.Embed,
		OutputDocumentFileName = outputFile,
	});

	docConverter.Jobs.RunJob(job);

	var pdfDocument = job.OutputFiles.FirstOrDefault();
	BookmarkDocument(pdfDocument);
}

Get Cache

GetCache creates an ObjectCache, which is a LEADTOOLS FileCache object and is stored in the specified file location.


static ObjectCache GetCache()
{
	// Create a LEADTOOLS FileCache object 
	var cacheDir = @"C:\Temp\cache";
	if (Directory.Exists(cacheDir))
	Directory.Delete(cacheDir, true);

	Directory.CreateDirectory(cacheDir);

	var cache = new FileCache();
	cache.CacheDirectory = cacheDir;

	return cache;
}

Add Footer Annotations

AddFooterAnnotation does exactly what it sounds like! It goes through each page of each file and applies a simple text annotation in the footer stating it was generated automatically on the date and time it was made.


static void AddFooterAnnotation(LEADDocument leadDocument)
{
	foreach (var documentPage in leadDocument.Pages)
	{
		documentPage.Document.IsReadOnly = false;
		var annContainer = documentPage.GetAnnotations(true);

		var footer = new AnnTextObject
		{
			Text = $"This PDF file was generated automatically by an automated process on {DateTime.Now:G}",
			HorizontalAlignment = AnnHorizontalAlignment.Center,
			TextForeground = AnnSolidColorBrush.Create("black"),
			TextBackground = AnnSolidColorBrush.Create("white"),
			Stroke = null,
			Fill = AnnSolidColorBrush.Create("transparent"),
			Font = new AnnFont("arial", 10),
			Rect = LeadRectD.Create(100, documentPage.Size.Height - 200, documentPage.Size.Width * 3 / 4, 215)
		};

		annContainer.Children.Add(footer);
		documentPage.SetAnnotations(annContainer);
		documentPage.Document.IsReadOnly = true;
	}
}

Add Bookmarks

BookmarkDocument loops through each page of each file, much like AddFooterAnnotation, and creates a bookmark for each one so when you go to view the outputted .PDF file the bookmarks will be able to jump the viewer to any of the pages with one click.


static void BookmarkDocument(string pdfFile)
{
	// Create a version of the source file with a few bookmarks
	var file = new PDFFile(pdfFile);
	file.Load();
	var bookmarks = new List<PDFBookmark>();

	for (int i = 0; i < file.Pages.Count; i++)
	{
		PDFFilePage page = file.Pages[i];

		var bookmark = new PDFBookmark
		{
			Title = "Page " + page.PageNumber.ToString(),
			BookmarkStyle = PDFBookmarkStyle.Plain,
			Level = 0,
			TargetPageNumber = page.PageNumber,
			TargetPageFitType = PDFPageFitType.Default,
			TargetPosition = new PDFPoint(0, page.Height),
			TargetZoomPercent = 0
		};
		bookmarks.Add(bookmark);
	}

	var pdfDocument = new PDFDocument(pdfFile);
	pdfDocument.ParsePages(PDFParsePagesOptions.Annotations, 1, -1);
	var annList = new List<PDFAnnotation>();
	Console.WriteLine(pdfDocument.Pages.Count);
	foreach (var page in pdfDocument.Pages)
	{
		annList.AddRange(page.Annotations);
		Console.WriteLine(annList.Count);
	}

	file.WriteBookmarks(bookmarks, pdfFile);
	file.WriteAnnotations(annList, pdfFile);            
}

You now have a Microsoft Worker Service that automates bookmarking, annotating, and creating searchable PDF files using LEADTOOLS SDK. You can also make this into a Microsoft Windows Service.

Free 60-Day Evaluation SDK

We offer a FREE 60-day evaluation so you can test all these features and program with LEADTOOLS before a purchase is even made. Gain access to our extensive documentation, sample source code, demos, and tutorials.

LEAD Is Here to Help!

Contact our support team for free technical support! For pricing or licensing questions, you can contact our sales team via email or call us at 704-332-5532.

This entry was posted in Document Converter. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *