In This Topic ▼

Uploading Using the Document Library

The LEADTOOLS Document library supports uploading documents from the user drive to the cache system. This support is helpful when the cache is hosted on a separate machine that is not accessible to the user (for example, when the Documents class is hosted in a web service with a JavaScript or rich client application such as the HTML5/JavaScript version of the LEADTOOLS Document Viewer using the LEADTOOLS Document Web Service).

Uploading a document is supported only if a valid instance of a LEADTOOLS Cache Object is passed to DocumentFactory.UploadDocument or set up in the global DocumentFactory.Cache property.

All the methods for uploading a document are in the DocumentFactory class. The document can be uploaded in chunks to show a progress bar and optionally allow the user to abort the application.

The application should first prompt the user for the location of the document and then obtain a file stream object with read access to the original file. At that point, the following actions are possible:

Create an instance of UploadDocumentOptions that can be populated with any optional values and then call the BeginUpload method. This method will return a temporary URL object to the identity created by the DocumentFactory to identify this document for subsequent calls. This is a custom URL value in the following scheme leadcache://unique_guid_identifier and is not meant to be used by anything other than the rest of the upload methods. This value must be saved in the application in a local variable at this point. If the total length of the data being uploaded is available at this time, then setting the value of UploadDocumentOptions.DocumentDataLength can greatly help the factory optimize the loading operations.
Read a chunk from the source file into a byte array. The size of this chunk is up to the user, the larger the chunk size, the shorter the whole upload operation. Note though, too large of a chunk might throttle the server connection. A value of 64K is a good minimum. Once the chunk is read, call the DocumentFactory.UploadDocument method, passing the URL value obtained from BeginUpload and the chunk of data.
Repeat until the file has been read and uploaded, then call EndUpload.
External annotations data can also be uploaded to the document at this time if desired. Call UploadAnnotations as many times as needed to upload the data in the same manner.
When uploading is finished, simply call the LoadFromUri or LoadFromCache method as usual when loading a document from a remote URL. The DocumentFactory class will check the value of the URL and can identify it as an uploaded document that does not have access to the physical file and so instead it will parse the document from the uploaded data. The data is stored in the cache and is disposed of at the same time the LEADDocument object is disposed of, and its cache items have expired.
The upload URL is not needed after the LEADDocument object is obtained. The LEADDocument object itself as well as the DocumentId are used as usual to interact with the document from this point on.
At any point during uploading the document, DocumentFactory.AbortUploadDocument can be called to abort the operation and DocumentFactory will immediately delete from the cache the data uploaded so far.

For an example, refer to DocumentFactory.BeginUpload.

Enable Streaming

During the upload process in UploadDocument, DocumentFactory saves each chunk into the cache. Therefore, the latest document data is available to all users of the system. If the system encounters an error and the process restarts, the upload operation can be restarted and uploading continues. This also ensures minimum amount of memory is needed when uploading large documents (only chunk size is needed at any time). This is the behavior when UploadDocumentOptions.EnableStreaming is false, the default value.

If EnableStreaming is set to true, then DocumentFactory does not save the chunk data into the cache during document upload in UploadDocument. Instead, it creates an internal stream in memory and appends the chunks into it as they arrive. When EndUpload is called, the factory then stores the data at once from the stream into the cache. This may speed up uploading of a document but at the expense of more memory being used (the whole document's data will be in memory). Take into account that uploading operations cannot be restarted, if the system encounters an error then the process restarts. Therefore, it is recommended not to set EnableStreaming to true unless the system is designed for a single process or single user operations.

Post Upload Operations

Post upload operations can be performed on a document after it has been uploaded to the cache and before it is first loaded. These operations and their optional values can be set in the UploadDocumentOptions.PostUploadOperations dictionary.

The operations are performed when EndUpload is called.

The current version of LEADTOOLS contains support for the following post upload operations:

Auto-Linearize PDF

Linearized PDF documents are documents that are optimized for fast web viewing. When using client-side rendering, the LEADTOOLS JavaScript Document Viewer can take advantage of linearized PDF to start viewing the PDF file before all its content is downloaded.

Uploaded PDF files can be checked for linearization and converted upon upload using the following:

// Automatically linearize (optimize for fast web viewing) PDF files that are greater than 1 MBytes in size. 
byte[] pdfData = ...; // PDF data to upload 
string pdfPassword = null; // If the PDF is encrypted, set its password here 
var uploadDocumentOptions = new UploadDocumentOptions(); 
uploadDocumentOptions.Cache = cache; 
const int minimumLengthInBytes = 1024 * 1024; 
uploadDocumentOptions.PostUploadOperations.Add(LEADDocument.PostUpload_LinearizePdfMinimumLength, minimumLengthInBytes.ToString()); 
 
// The factory will not perform this operation unless we set the correct mime type: 
uploadDocumentOptions.MimeType = "application/pdf"; 
uploadDocumentOptions.Password = pdfPassword; 
 
// Now upload 
Uri documentUri = DocumentFactory.BeginUpload(uploadOptions); 
DocumentFactory.UploadDocument(cache, documentUri, pdfData, 0, pdfData.Length); 
DocumentFactory.EndUpload(cache, documentUri);

The factory performs the following actions on EndUpload:

Determines if the total upload size is greater than minimumLengthInBytes. If so, calls PDFFile.IsLinearized to check if the PDF is not already optimized. If not optimized, calls PDFFile.Linearize.
If the PDF is encrypted, then a password is required to linearize it. This can be passed in UploadDocumentOptions.Password as shown above.

The metadata of LEADDocument can be examined to determine if this is a PDF document with linearized data using the following:

LEADDocument document = DocumentFactory.LoadFromCache(loadFromCacheOptions); 
bool isLinearized = false; 
// Check if the metadata contains the key 
DocumentMetadata metadata = document.Metadata; 
if (metadata.ContainsKey(LEADDocument.MetadataKey_IsLinearized)) 
{ 
   // Yes 
   isLinearized = bool.Parse(metadata[LEADDocument.MetadataKey_IsLinearized]); 
} 
 
if (isLinearized) 
{ 
   // Perform additional actions 
}

Reference

Document Library Features

Loading Using LEADTOOLS Document Library

Creating Documents with LEADTOOLS Document Library

Document Toolkit and Caching

Document Library Coordinate System

Loading Encrypted Files Using the Document Library

Parsing Text with the Document Library

Barcode processing with the Document Library

Document Toolkit History Tracking

Document Page Transformation

Using LEADTOOLS Document Viewer

Using LEADTOOLS Document Converters

Document View and Convert Redaction

Download our FREE evaluation

Help Version 20.0.2020.4.3