Loading Using LEADTOOLS Document Library

The LEADTOOLS Document library supports loading by creating a LEADDocument object from data that reside in a disk file, a remote URL, or data that was previously uploaded to the cache system.

Loading from Disk Files

To load a LEADDocument object from a disk file, create an instance of LoadDocumentOptions and then pass it along with the file name to DocumentFactory.LoadFromFile:

var loadDocumentOptions = new LoadDocumentOptions(); 
// Initialize loadDocumentOptions as needed 
Document document = DocumentFactory.LoadFromFile(fileName, loadDocumentOptions); 

The following steps explain how this method works:

  1. The LoadDocumentOptions.UseCache is checked. If the value is true, then the application must have set a valid cache object in either LaodDocumentOptions.Cache or DocumentFactory.Cache. Otherwise, an exception is thrown. If the value is false, then the new document will not use caching.

  2. Caching is optional in this mode and not required. It can be used to speed up obtaining document image data or text if the pages are revisited by the application or to save the document to the cache before it is disposed. Otherwise, all the data will be parsed from the original file as needed.

  3. If the value of LoadDocumentOptions.AnnotationsUri is not null, then it must contain the URL to a disk file as well. You can create a new Uri object from the physical path to the annotation file on disk and set it in this property. This will create an Uri object with file:/// scheme. Any other scheme (such as http) will fail when using LoadFromFile.

  4. The factory will obtain information on the file format in fileName using RasterCodecs.GetInformation. If this fails (if it is an invalid file format or the required LEADTOOLS file format assembly is not found) then an exception is thrown.

  5. A LEADDocument object is created and the following members are initialized:

    Member Value
    DocumentId A unique identifier created for this document that can be used if the document is saved to the cache.
    Uri new Uri(fileName).
    IsReadOnly true.
    CacheUri null since the document has direct access to the physical file.
    Stream null.
    HasStream false.
    IsDownloaded false since the document was not downloaded.
    GetDocumentFileName Will return the same fileName passed to LoadFromFile.
    GetDocumentStream null.
    GetAnnotationsFileName Will return the same file name passed to LoadDocumentOptions.AnnotationsUri.
    GetAnnotationsStream null.
    HasAnnotationsStream false.
    DocumentType The document type.
    MimeType The MIME type of the document file format set during load.
    HasCache The same value as LoadDocumentOptions.UseCache. If the value is true, then GetDocumentFileName or GetDocumentStream can be used to obtain the original data. Otherwise, it will return the path to the temporary file.
    LastCacheSyncTime Random old date since the document has not yet been saved to the cache.
    CacheStatus DocumentCacheStatus.NotSynced since the document has not yet been saved to the cache.
    AutoDeleteFromCache true. Can be changed to false if the application will re-load this document from the cache at a later time using DocumentFactor.LoadFromCache.
    AutoSaveToCache false.
    InternalObject The internal LEADTOOLS object that being used to parse the document data.
    UserData null
    IsEncrypted false unless the document is encrypted. In this case most of the document properties cannot be used before the document is decrypted. Refer to Loading Encrypted Files Using the Document Library for more information.
    IsDecrypted false
    IsStructureSupported true or false based on the MIME type of the document.
    Metadata Ready to be used.
    Structure Ready to be used.
    Images Ready to be used.
    Text Ready to be used.
    Pages Ready to be used.
    Documents Empty collection since this is not a virtual document.
    HasDocuments false.
    AutoDisposeDocuments false.
    Annotations Ready to be used.
  6. LoadFromFile returns with this LEADDocument object ready to be used.

  7. LEADDocument will parse data from the original file on disk on demand, therefore the original fileName passed to LoadFromFile must not be deleted while the Document is alive. Otherwise, errors will occur when accessing the document data.

For an example, refer to DocumentFactory.LoadFromFile.

Loading from a Remote URL

To create a LEADDocument object from a remote URL, create an instance of LoadDocumentOptions and then pass it along with a URL object pointing to the remote location of the document file to DocumentFactory.LoadFromUri:

var loadDocumentOptions = new LoadDocumentOptions(); 
// Initialize loadDocumentOptions as needed 
Document document = DocumentFactory.LoadFromUri(uri, loadDocumentOptions); 

The following steps explain how this method works:

  1. LoadDocumentOptions.UseCache is checked. If the value is true, then the application must have set a valid cache object in either LoadDocumentOptions.Cache or DocumentFactory.Cache. Otherwise, an exception is thrown. If the value is false, then the new document will not use caching.

  2. The cache is also optional in this mode and not required. As well as speeding obtaining document image data or text from pages that were previously visited, the cache can be used to download the document file name in uri as explained below.

  3. If the uri passed to LoadFromUri has the special LEAD cache scheme (detected using IsUploadDocumentUri), then the factory assumes this is the URI to a document previously uploaded to the cache using DocumentFactory.BeginUpload and steps below are not performed and no data is downloaded. The data is already in the cache and the factory skips to step 8 below.

  4. If the value of LoadDocumentOptions.AnnotationsUri is not null, then it will be treated as a remote URL and the data is downloaded by the factory in the same manner used for the document file as explained below.

  5. The factory will check the value of LoadDocumentOptions.UseCache:

    • If the value is true, then the document data is downloaded from uri into the cache system.

    • If the value is false, then the document data is downloaded from uri to a temporary file name created on the machine.

  6. Similarly, if LoadDocumentOptions.AnnotationsUri is not null, it will be downloaded either to the cache system or to a temporary file based on cache usage.

  7. When downloading the data, the factory will use the WebClient object in LoadDocumentOptions if not null. Otherwise, it will create a new instance and dispose it after it has been used. This allows the application to pass a custom WebClient with specific proxy or credential settings or to monitor the download progress.

  8. The factory will obtain information on the file format using RasterCodecs.GetInformation on the downloaded or temporary file or cache data. If this fails (if it is an invalid file format or the required LEADTOOLS file format assembly is not found), then the cache or downloaded data is deleted and an exception is thrown.

  9. LEADDocument object is created and the following members are initialized:

    Member Value
    DocumentId A unique identifier created for this document that can be used if the document is saved to the cache.
    Uri Same uri passed to LoadFromUri.
    IsReadOnly true.
    CacheUri If the document was downloaded to the cache and if the cache system has virtual directory capabilities, then this property will contain a URI to the original document data (PDF, TIFF, DOCX, etc.). Otherwise, is null.
    Stream null.
    HasStream false.
    IsDownloaded true since the document was downloaded.
    GetDocumentFileName Will return the path to the cache item or temporary file containing the downloaded data of the original document. If the cache does not have direct access to the file system then this will be null.
    GetDocumentStream If the cache does not have direct access to the file system then this will return a stream containing the original document. Otherwise, null.
    GetAnnotationsFileName Will return the path to the cache item or temporary file containing the downloaded data of the annotations. If the cache does not have direct access to the file system then this will be null
    GetAnnotationsStream If the cache does not have direct access to the file system then this will return a stream containing the annotations. Otherwise, null.
    HasAnnotationsStream true or false depending on the above.
    DocumentType The document type.
    MimeType The MIME type of the document file format set during load.
    HasCache The same value as LoadDocumentOptions.UseCache. If the value is true, then GetDocumentFileName or GetDocumentStream can be used to obtain the original data. Otherwise, it will return the path to the temporary file.
    LastCacheSyncTime Random old date since the document has not yet been saved to the cache.
    CacheStatus DocumentCacheStatus.NotSynced since the document has not yet been saved to the cache.
    AutoDeleteFromCache true. Can be changed to false if the application will re-load this document from the cache at a later time using DocumentFactor.LoadFromCache.
    AutoSaveToCache false.
    InternalObject The internal LEADTOOLS object that being used to parse the document data.
    UserData null
    IsEncrypted false unless the document is encrypted. In this case most of the document properties cannot be used before the document is decrypted. Refer to Loading Encrypted Files Using the Document Library for more information.
    IsDecrypted false
    IsStructureSupported true or false based on the MIME type of the document.
    Metadata Ready to be used.
    Structure Ready to be used.
    Images Ready to be used.
    Text Ready to be used.
    Pages Ready to be used.
    Documents Empty collection since this is not a virtual document.
    HasDocuments false.
    AutoDisposeDocuments false.
    Annotations Ready to be used.
  10. LoadFromUri returns with this LEADDocument object ready to be used.

  11. The document will parse data from the downloaded data, therefore the original URL passed to LoadFromUri is never used again and the data it points to can be deleted right away.

  12. When the document is disposed, the temporary files and cache items will be deleted unless it is saved to the cache first.

For an example, refer to DocumentFactory.LoadFromUri.

Loading from a Remote URL Asynchronously

LoadFromUri does not return control to the application till the document is downloaded and parsed. To create a LEADDocument object from a remote URL asynchronously, create an instance of LoadDocumentAsyncOptions and pass it along with a URL object pointing to the remote location of the document file to DocumentFactory.LoadFromUriAsync:

var loadDocumentAsyncOptions = new LoadDocumentAsyncOptions(); 
// Initialize loadDocumentAsyncOptions as needed. The Completed event is a must: 
loadDocumentAsyncOptions.Completed += (sender, e) => { 
   // Completed, use e.Document 
}; 
DocumentFactory.LoadFromUriAsync(uri, loadDocumentAsyncOptions); 

The following steps explain how this method works:

  1. LoadDocumentOptions.UseCache is checked. If the value is true, then the application must have set a valid cache object in LoadDocumentOptions.Cache or DocumentFactory.Cache. Otherwise, an exception is thrown. If the value is false, then the new document will not use caching.

  2. The cache is also optional in this mode and not required. As well as speeding obtaining document image data or text from pages that were previously visited, the cache can be used to download the document file name in uri as explained below.

  3. If the uri value passed to LoadFromUriAsync has the LEAD cache scheme, then the factory assumes this is the URI to a document previously uploaded to the cache using DocumentFactory.BeginUpload and steps below are not performed and no data is downloaded. The data is already in the cache and the factory skips to step 10 below.

  4. A thread is created to handle loading the document, control is returned to the application and the rest of these steps are performed in the thread procedure.

  5. If the value of LoadDocumentOptions.AnnotationsUri is not null, then it will be treated as a remote URL and the data is downloaded by the factory in the same manner used for the document file as explained below.

  6. The factory will check the value of LoadDocumentOptions.UseCache:

    • If the value is true, then the document data is downloaded from uri into the cache system.

    • If the value is false, then the document data is downloaded from uri to a temporary file created on the machine.

  7. Similarly, if LoadDocumentOptions.AnnotationsUri is not null, it will be downloaded either to the cache system or to a temporary file based on cache usage.

  8. When downloading the data, the factory will use the WebClient object in LoadDocumentOptions if not null. Otherwise, it will create a new instance and dispose it after it has been used. This allows the application to pass a custom WebClient with specific proxy or credential settings.

  9. The WebClient.DownloadProgressChanged event is mapped to LoadDocumentAsyncOptions.Progress if the value is not null to allow the user to monitor the progress of the download.

  10. When WebClient.DownloadFileCompleted occurs, the factory will obtain information on the file format using RasterCodecs.GetInformation on the downloaded or temporary file. If this fails (if it is an invalid file format or the required LEADTOOLS file format assembly is not found), then the cache or downloaded data is deleted and LoadDocumentAsyncOptions.Completed is fired with the error object in LoadAsyncCompletedEventArgs.Error.

  11. Otherwise, LEADDocument object is created and the following members are initialized:

    Member Value
    DocumentId A unique identifier created for this document that can be used if the document is saved to the cache.
    Uri Same uri passed to LoadFromUriAsync.
    Stream null.
    HasStream false.
    IsDownloaded true since the document was downloaded.
    GetDocumentFileName Will return the path to the cache item or temporary file containing the downloaded data of the original document. If the cache does not have direct access to the file system then this value will be null.
    GetDocumentStream If the cache does not have direct access to the file system then this will return a stream containing the original document. Otherwise, null.
    GetAnnotationsFileName Will return the path to the cache item or temporary file containing the downloaded data of the annotations. If the cache does not have direct access to the file system then this will be null.
    GetAnnotationsStream If the cache does not have direct access to the file system then this will return a stream containing the annotations. Otherwise, null.
    HasAnnotationsStream true or false depending on the above.
    DocumentType The document type.
    MimeType The MIME type of the document file format set during load.
    HasCache The same value as LoadDocumentOptions.UseCache. If the value is true, then GetDocumentFileName or GetDocumentStream can be used to obtain the original data. Otherwise, it will return the path to the temporary file.
    LastCacheSyncTime Random old date since the document has not yet been saved to the cache.
    CacheStatus DocumentCacheStatus.NotSynced since the document has not yet been saved to the cache.
    AutoDeleteFromCache true. Can be changed to false if the application will re-load this document from the cache at a later time using DocumentFactor.LoadFromCache.
    AutoSaveToCache false.
    InternalObject The internal LEADTOOLS object that being used to parse the document data.
    UserData null
    IsEncrypted false unless the document is encrypted. In this case most of the document properties cannot be used before the document is decrypted. Refer to Loading Encrypted Files Using the Document Library for more information.
    IsDecrypted false
    IsStructureSupported true or false based on the MIME type of the document.
    Metadata Ready to be used.
    Structure Ready to be used.
    Images Ready to be used.
    Text Ready to be used.
    Pages Ready to be used.
    Documents Empty collection since this is not a virtual document.
    HasDocuments false.
    AutoDisposeDocuments false.
    Annotations Ready to be used.
  12. The LoadDocumentAsyncOptions.Completed event is fired with the LEADDocument object in LoadAsyncCompletedEventArgs.Document. This LEADDocument object is now ready to be used.

  13. LEADDocument will parse data from the downloaded data, therefore the original URL passed to LoadFromUriAsync is never used again and the data it points to can be deleted right away.

  14. When LEADDocument is disposed, the temporary files will be deleted unless it is saved to the cached first.

For an example, refer to DocumentFactory.LoadFromUriAsync.

Loading from a Stream

To create a LEADDocument object from a document stored in a stream, create an instance of LoadDocumentOptions then pass it along with the stream object to DocumentFactory.LoadFromStream:

var loadDocumentOptions = new LoadDocumentOptions(); 
// Initialize loadDocumentOptions as needed 
Document document = DocumentFactory.LoadFromFile(stream, loadDocumentOptions); 

The following steps explain how this method works:

  1. The LoadDocumentOptions.UseCache is checked. If the value is true, then the application must have set a valid cache object in LoadDocumentOptions.Cache or DocumentFactory.Cache. Otherwise, an exception is thrown. If the value is false, then the new document will not use caching.

  2. Caching is optional in this mode and not required. It can be used to speed up obtaining document image data or text if the pages are revisited by the application or to save the document to the cache before it is disposed. Otherwise, all the data will be parsed from the original stream as needed.

  3. If the value of LoadDocumentOptions.AnnotationsUri is not null, then it must contain the URL to a disk file as well. You can create a new Uri object from the physical path to the annotation file on disk and set it in this property. This will create an Uri object with file:/// scheme. Any other scheme (such as http) will fail when using LoadFromStream.

  4. The factory will obtain information on the file format in stream using RasterCodecs.GetInformation. If this fails (if it is an invalid file format or the required LEADTOOLS file format assembly is not found) then an exception is thrown.

  5. LEADDocument object is created and the following members are initialized:

    Member Value
    DocumentId A unique identifier created for this document that can be used if the document is saved to the cache.
    Uri null.
    Stream The original stream passed to LoadFromStream.
    HasStream true.
    IsDownloaded false since the document was not downloaded.
    GetDocumentFileName null.
    GetDocumentStream null.
    GetAnnotationsFileName Will return the same file name passed to LoadDocumentOptions.AnnotationsUri.
    GetAnnotationsStream null.
    HasAnnotationsStream false.
    DocumentType The document type.
    MimeType The MIME type of the document file format. The value is set during load.
    HasCache The same value as LoadDocumentOptions.UseCache.
    LastCacheSyncTime Random old date since the document has not yet been saved to the cache.
    CacheStatus DocumentCacheStatus.NotSynced since the document has not yet been saved to the cache.
    AutoDeleteFromCache true. Can be changed to false if the application will re-load this document from the cache at a later time using DocumentFactor.LoadFromCache.
    AutoSaveToCache false.
    InternalObject The internal LEADTOOLS object being used to parse the document data.
    UserData null
    IsEncrypted false unless the document is encrypted. In the document is encrypted, most of the document properties cannot be used before the document is decrypted. Refer to Loading Encrypted Files Using the Document Library for more information.
    IsDecrypted false
    IsStructureSupported true or false based on the MIME type of the document.
    Metadata Ready to be used.
    Structure Ready to be used.
    Images Ready to be used.
    Text Ready to be used.
    Pages Ready to be used.
    Documents Empty collection since this is not a virtual document.
    HasDocuments false.
    AutoDisposeDocuments false.
    Annotations Ready to be used.
  6. LoadFromStream returns with this LEADDocument object ready to be used.

  7. LEADDocument will parse data from the original stream on demand, therefore the original stream passed to LoadFromStream must be kept alive by the user while Document is alive. Otherwise, errors will occur when accessing the document data.

If the document is saved into the cache using SaveToCache, then the entire content of the stream is saved into the cache and the stream is no longer used and can be safely disposed by the user. When the document is later re-loaded from the cache using DocumentFactory.LoadFromCache then it is treated as it was downloaded from an external resource and the stream functionality is not used (the value of Stream will be null).

For an example, refer to DocumentFactory.LoadFromStream.

MIME Type Whitelisting

If MIME type whitelisting is used, it is possible for the DocumentFactory load methods to return null as the resulting document if its MIME type was denied. Refer to DocumentMimeTypes for more information.

Cloning a Document

The following methods allow the user to create a clone (an exact copy) of a document stored in the cache:

Getting Document Information

The following methods can be used to quickly obtain information on a document without loading it. Information obtained includes the document name, mime type and number of pages:

Deleting Document from the Cache

Documents are automatically deleted when they expire as setup using the cache policies. The following method can be used to manually delete a document from the cache at any time:

Help Version 20.0.2018.9.5
Products | Support | Contact Us | Copyright Notices
© 1991-2018 LEAD Technologies, Inc. All Rights Reserved.

LEADTOOLS Imaging, Medical, and Document