ExtractText - Python

This tutorial shows how to extract text data in a file using the LEADTOOLS Cloud Services in a Python application.

Overview  
Summary This tutorial covers how to make ExtractText requests and process the results using the LEADTOOLS Cloud Services in a Python application.
Completion Time 30 minutes
Project Download tutorial project (1 KB)
Platform LEADTOOLS Cloud Services API
IDE Visual Studio
Language Python
Development License Download LEADTOOLS
Try it in another language

Required Knowledge

Be sure to review the following sites for information about LEADTOOLS Cloud Services API.

Application ID and Password

Create an Account with LEADTOOLS Hosted Cloud Services to obtain both Application ID and Password strings.

Service Plans

LEADTOOLS Service Plan offerings:

Service Plan Description
Free Trial Free Evaluation
Page Packages Prepaid Page Packs
Subscriptions Prepaid Monthly Processed Pages

To further explore the offerings, refer to the LEADTOOLS Hosted Cloud Services page.

To obtain the necessary Application ID and Application Password, refer to Create an Account and Application with the LEADTOOLS Hosted Cloud Services.

Add the ExtractText Code

With the project created and the requests package added, coding can begin.

In the Solution Explorer, open ExtractText.py. Add the following variables at the top.

Note

Where it states Replace with Application ID and Replace with Application Password, be sure to place your Application ID and Password accordingly.

# Simple script to make and process the results of a ExtractText request to the LEADTOOLS CloudServices. 
import requests 
import sys 
import time 
 
servicesUrl = "https://azure.leadtools.com/api/" 
 
# The application ID. 
appId = "Replace with Application ID" 
 
# The application password. 
password = "Replace with Application Password" 
 
# The first page in the file to mark for processing 
firstPage = 1 
 
# Sending a value of -1 will indicate to the services that the rest of the pages in the file should be processed. 
lastPage = -1 
 
# We will be uploading the file via a URl.  Files can also be passed by adding a PostFile to the request.  Only 1 file will be accepted per request. 
# The services will use the following priority when determining what a request is trying to do GUID > URL > Request Body Content 
fileURL = 'http://demo.leadtools.com/images/cloud_samples/ocr1-4.tif' 
baseRecognitionUrl = '{}Recognition/ExtractText?firstPage={}&lastPage={}&fileurl={}' 
formattedRecognitionUrl = baseRecognitionUrl.format( 
    servicesUrl, firstPage, lastPage, fileURL) 

Add a request.post call to process the ExtractText request and capture the GUID from the resulting request.text, then provide it to the next section. This sends an ExtractText request to the LEADTOOLS Cloud Services API, if successful, a unique identifier (GUID) will be returned and then a query using this GUID will be made.

request = requests.post(formattedRecognitionUrl, auth=(appId, password)) 
 
# If uploading a file alongside the HTTP request 
#baseRecognitionUrl ='{}Recognition/ExtractText?firstPage={}&lastPage={}' 
#formattedRecognitionUrl = baseRecognitionUrl.format( 
#    servicesUrl,firstPage, lastPage) 
#file = {'file' : open('path/to/file', 'rb')} 
#request = requests.post( 
#    formattedRecognitionUrl, auth=(appId, password), files = file) 
 
if request.status_code != 200: 
    print("Error sending the conversion request") 
    print(request.text) 
    sys.exit() 
 
# Grab the GUID from the Request 
guid = request.text 
print("Unique ID returned by the services: " + guid + "\n") 

Next, create a Query request that utilizes the GUID provided by ExtractText request. If successful the response will contain all the request data in JSON format.

# Now, we need to Query the results 
print("Now Querying Results....") 
baseQueryUrl = '{}Query?id={}' 
formattedQueryUrl = baseQueryUrl.format(servicesUrl, guid) 
 
while True:  # Poll the services to determine if the request has finished processing 
    request = requests.post(formattedQueryUrl, auth=(appId, password)) 
    returnedData = request.json() 
    if returnedData['FileStatus'] != 100 and returnedData['FileStatus'] != 123: 
        break 
    time.sleep(5) 
 
print("File finished processing with file status: " + 
      str(returnedData['FileStatus'])) 
 
if returnedData['FileStatus'] != 200: 
    sys.exit() 

Finally, parse the JSON data into a readable format.

try: 
    print("Results:") 
    returnedJson = returnedData['RequestData'] 
    for requestObject in returnedJson: 
        print("Service Type: " + requestObject['ServiceType']) 
        if requestObject['ServiceType'] == 'Recognition' and requestObject['RecognitionType'] == 'Text': 
            print("Data: " + requestObject['data']) 
except Exception as e: 
    print("Failed to Parse JSON") 
    print(str(e)) 

Run the Project

Run the project by pressing F5, or by selecting Debug -> Start Debugging.

If the steps were followed correctly, the console appears and the application displays the extracted text information from the returned JSON data.

Extracted Text Information

Wrap-up

This tutorial showed how to extract text information via the LEADTOOLS Cloud Services API.

See Also

Help Version 22.0.2024.3.20
Products | Support | Contact Us | Intellectual Property Notices
© 1991-2023 LEAD Technologies, Inc. All Rights Reserved.


Products | Support | Contact Us | Intellectual Property Notices
© 1991-2023 LEAD Technologies, Inc. All Rights Reserved.