Linux OCR, Barcode and Format Conversion Batch Processor: 25 Projects in 25 Days

As part of the LEAD Technologies 25th anniversary, we are creating 25 projects in 25 days to celebrate LEAD’s depth of features and ease of use. Today’s project comes from Nathan.

Download the Project

What it Does

This C project will perform OCR, barcode recognition and file conversion using LEADTOOLS Version 19.

Features Used

Development Progress Journal

Hello, my name is Nathan and I am going to write a Linux application that combines OCR, barcode recognition, and file conversion into one batch processing application. It’s been quite a while since I’ve written a program in C so this should be fun!

Since I already have the LEADTOOLS SDK installed, I’m going to start with something simple to get my C gears turning. I’ll start by doing input verification, as I want to make sure the user uses the application properly and print out how to use it if they don’t.

That took a couple hours and I think I have input down pat. I’m going to store all of the options in a struct and then call functions that will do all the LEADTOOLS stuff from a header file if the flags are passed.

I’m going to go ahead and write a makefile that links all the libraries we’re going to need in this program so I don’t have to fiddle with compiling anymore and just type “make.”

Now I’m going to write a function that does file conversion. It’ll take a char * for source and target directories, intended format and then a struct to communicate the type with LEADTOOLS.

That only took about 45 minutes! I ran into some issues when opening directories, but it only took about 35 lines of code. That includes all the code we need to convert every file in our source directory, which is pretty amazing for C.

Now that I have that working, I’m going to write a function for barcode recognition.

That took about an hour and a half. Barcode recognition was a little trickier because I needed a couple helper functions to call from within. Even still, it wasn’t too bad and now my program can handle any barcode and it will write the data to a text file. And I can now do file conversion and barcode recognition from all the files by just passing both flags as command line arguments, that’s pretty sweet.

Now last but not least, I need to do OCR, which is a really complex thing to do!

That took about 3 hours since I’m not the most C-savvy person. Now I can OCR any image from the directory and output the text to a .txt file.

I need to do some code cleaning and commenting but this should only take about 20 minutes.

That wraps things up and brings me in at less than 8 hours. In one work day I was able to write a batch processing application that can do OCR, barcode recognition and file conversion, all in one. This is extremely useful for Linux users! Write a script to run this application and you can automate a lot of work.

In a future release, I’d probably like to separate some of my code into functions, do some multi-threading for performance, and allow for the int versions of the format constants or the common terms (tif, jpg, png, etc..) to be used.

This entry was posted in Document Imaging and tagged , , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *