←Select platform

DocumentMimeTypes Class

Summary

MIME type whitelisting support.

Syntax
C#
C++/CLI
Python
public class DocumentMimeTypes 
public: 
   ref class DocumentMimeTypes 
class DocumentMimeTypes: 
Remarks

LEADTOOLS supports reading a large number of file formats. These include formats that are used frequently in document management systems such as PDF (application/pdf), TIFF (image/tiff) and DOCX (application/vnd.openxmlformats-officedocument.wordprocessingml.document). It also includes formats that are rarely used in these situations such as GIF files (image/gif).

DocumentFactory contains the DocumentFactory.LoadFromUri, DocumentFactory.LoadFromFile and DocumentFactory.LoadFromStream methods that are used to load a document from a URI, file or stream respectively. And if the data contains an image or document format that can be loaded by LEADTOOLS, a new LEADDocument object is created and returned to the user.

In certain situations, an application may require to explicitly allow/disallow certain mime types to lessen the possibility of failure, for security reasons or to improve user experience. This technique is called MIME type whitelisting.

DocumentFactory contains a static instance of the DocumentMimeTypes class in the DocumentFactory.MimeTypes property. This instance contains entries for mime types and a status indicating the behavior of each (unspecified, allow, deny). These entries are stored in the DocumentMimeTypes.Entries dictionary with each entry containing a mime type as key and DocumentMimeTypeStatus enumeration member as the value. The default status of each mime type is stored in DocumentMimeTypes.DefaultStatus property which has a value of DocumentMimeTypeStatus.Unspecified (meaning, perform the default action).

When DocumentFactory checks a MIME type of a document in the process of loading it, it will make a call to DocumentMimeTypes.GetStatus. This method will first check the Entries dictionary, and if the mime type key is found, will return the DocumentMimeTypeStatus value. If no such entry is found, the value of DefaultStatus is returned.

By default, the Entries dictionary is empty and the value of DefaultStatus is Unspecified causing this status value to be returned by GetStatus each time DocumentFactory is loading a document. Therefore, DocumentFactory will load all file formats with any MIME type supported by LEADTOOLS by default.

Example 1: Explicitly Disallow Certain MIME Types

To disable loading GIF files using the Document toolkit, add an entry for its mime type as follows:

C#
// Disallow GIF mime type (DocumentMimeTypes reference) 
DocumentFactory.MimeTypes.Entries.Add("image/gif", DocumentMimeTypeStatus.Denied); 
 
// Load a GIF file 
LEADDocument gifDocument = DocumentFactory.LoadFromUri("http://example.org/images/file.gif", loadOptions); 
Debug.Assert(gifDocument == null); 

Will result in document equal to null and the application can check for this value and perform the next action such as informing the user that the document has a MIME type that has been explicitly denied.

Note that all other MIME types not specified by the application will still work, since the value of DefaultStatus is Unspecified to perform the default action.

Example 2: Explicitly Allow Certain MIME Types

To allow loading only PDF and TIFF files using the Document toolkit, add entries for the mime types as follows:

C#
// Allow PDF and TIF 
DocumentFactory.MimeTypes.Entries.Add("application/pdf", DocumentMimeTypeStatus.Allowed); 
DocumentFactory.MimeTypes.Entries.Add("image/tiff", DocumentMimeTypeStatus.Allowed); 
 
// Load a PDF document 
LEADDocument pdfDocument = DocumentFactory.LoadFromUri("http://example.org/images/file.pdf", loadOptions); 
// PDF is allowed per our requirement 
Debug.Assert(pdfDocument != null); 
 
// Load a TIF document 
LEADDocument tiffDocument = DocumentFactory.LoadFromUri("http://example.org/images/file.tif", loadOptions); 
// TIFF is allowed per our requirement 
Debug.Assert(tifDocument != null); 
 
// Load a GIF file 
LEADDocument gifDocument = DocumentFactory.LoadFromUri("http://example.org/images/file.gif", loadOptions); 
// GIF is disallowed per our requirement (only PDF and TIFF) 
Debug.Assert(gifDocument == null); 

For the PDF and TIFF document, the factory will call GetStatus and since entries for the MIME type is found, the status (Allowed) is returned and the documents are loaded correctly.

For the GIF file, GetStatus will not find an entry for its mime type and return the value of DefaultStatus, and since this is Unspecified by default (performs the default action), the factory will still be able to load the GIF file. This is obviously not what we wanted and assert will fail. Therefore, modify the example as follows:

C#
// Allow PDF and TIF 
DocumentFactory.MimeTypes.Entries.Add("application/pdf", DocumentMimeTypeStatus.Allowed); 
DocumentFactory.MimeTypes.Entries.Add("image/tiff", DocumentMimeTypeStatus.Allowed); 
// Deny GIF 
DocumentFactory.MimeTypes.Entries.Add("image/gif", DocumentMimeTypeStatus.Denied); 
 
// GIF is disallowed per our requirement (only PDF and TIFF) 
Debug.Assert(gifDocument == null); 

And now gifDocument will be null and our requirement is met.

What about loading a PNG file?

C#
// Load a PNG file 
LEADDocument pngDocument = DocumentFactory.LoadFromUri("http://example.org/images/file.png", loadOptions); 
// PNG is disallowed per our requirement (only PDF and TIFF) 
Debug.Assert(pngDocument == null); 

However, this does not work and the document is loaded because DefaultStatus is still Unspecified. We could add image/png to the list of denied MIME types but this will fail again for the next new MIME type we encounter. Instead, to meet our requirement of only allowing PDF and TIFF documents, modify the example like this:

C#
// Allow PDF and TIF 
DocumentFactory.MimeTypes.Entries.Add("application/pdf", DocumentMimeTypeStatus.Allowed); 
DocumentFactory.MimeTypes.Entries.Add("image/tiff", DocumentMimeTypeStatus.Allowed); 
// Disallow everything else instead of denying MIME types manually 
DocumentFactory.MimeTypes.DefaultStatus = DocumentMimeTypeStatus.Denied; 
 
// Load a PDF document 
LEADDocument pdfDocument = DocumentFactory.LoadFromUri("http://example.org/images/file.pdf", loadOptions); 
// PDF is allowed per our requirement 
Debug.Assert(pdfDocument != null); 
 
// Load a TIF document 
LEADDocument tiffDocument = DocumentFactory.LoadFromUri("http://example.org/images/file.tif", loadOptions); 
// TIFF is allowed per our requirement 
Debug.Assert(tifDocument != null); 
 
// Load a GIF file 
LEADDocument gifDocument = DocumentFactory.LoadFromUri("http://example.org/images/file.gif", loadOptions); 
// GIF is disallowed per our requirement (only PDF and TIFF) 
Debug.Assert(gifDocument == null); 
 
// Load a PNG file 
LEADDocument gifDocument = DocumentFactory.LoadFromUri("http://example.org/images/file.gif", loadOptions); 
// PNG is disallowed per our requirement (only PDF and TIFF) 
Debug.Assert(pngDocument == null); 

Using Entries and DefaultStatus, the application can have any combination of explicitly allowing or denying any or all MIME types.

Under the Hood

DocumentFactory will check for MIME types using GetStatus during LoadFromUri, LoadFromFile and LoadFromStream as follows:

Checks LoadDocumentOptions.MimeType. This member a default value of null but can be set by the user application to the actual mime type of the document being loaded. The factory will call GetStatus passing this value and fail loading the document if the status was Denied.

  • Next, for LoadFromUri, the factory can obtain the MIME type (media type) from the URL by reading the HTTP headers returned by the server hosting the document. It will also be checked and if denied, the load fails.

  • Next, for uploaded documents, a MIME type can also be set by the user application in UploadDocumentOptions.MimeType. If this value was set by the user then it will also be checked and if denied, the load will fail.

  • For the first two options, the MIME type might not available or set to a wrong value, therefore, finally the factory will obtain the real MIME type from the actual image data (using RasterCodecs) and if re-checked again and denied, the load fails.

  • Finally, the final MIME type obtained from all of the above is stored in the LEADDocument.MimeType property and load succeeds. The value of DocumentCacheInfo.MimeTypeStatus for this document will be set to the status found during this load operation.

All the above can be logged and traced using the UserGetDocumentStatusHandler callback. The application can set a custom handler in UserGetDocumentStatus and the factory will invoke this callback for all the operations above with the following parameters:

Parameter Description
uri The URI to the document being loaded.
options The LoadDocumentOptions object passed by the user.
source The source of this callback invocation.
mimeType The MIME type being checked

source can be any of the following:

Member Value
DocumentMimeTypeSource.User The mime type is passed by the user. For instance, in LoadDocumentOptions.MimeType.
DocumentMimeTypeSource.Cache The mime type is stored in the cache, for example, from UploadDocumentOptions.MimeType
DocumentMimeTypeSource.Url The mime type was obtained from the HTTP headers of a URL as set by the server containing the document.
DocumentMimeTypeSource.Data The mime type is read by LEADTOOLS RasterCodecs from the actual image data.

The RasterCodecs utility methods GetExtensionMimeType, GetMimeType and GetMimeTypeExtension can be used to obtain a MIME type to/from an extension or from a LEADTOOLS RasterImageFormat enumeration member.

Example

This example will allow only loading PDF and TIFF documents and deny everything else. The example also installs a callback to log all the MIME type verification operations. The user callback can return any value for the status of the MIME type or call GetDocumentStatus to continue with the configured action.

C#
Java
using Leadtools; 
using Leadtools.Codecs; 
using Leadtools.Document.Writer; 
 
using Leadtools.Document; 
using Leadtools.Caching; 
using Leadtools.Annotations.Engine; 
using Leadtools.Ocr; 
using Leadtools.Barcode; 
using Leadtools.Document.Converter; 
 
public void MimeTypesWhitelistExample() 
{ 
   // Documents to try and load 
   string[] urls = 
   { 
      "https://demo.leadtools.com/images/pdf/leadtools.pdf", 
      "https://demo.leadtools.com/images/tiff/ocr.tif", 
      "https://demo.leadtools.com/images/png/pngimage.png" 
    }; 
 
   // Setup a callback for logging 
   // DocumentMimeTypes & DocumentMimeTypes.UserGetDocumentStatusHandler reference 
   DocumentMimeTypes.UserGetDocumentStatusHandler userGetDocumentStatus = (Uri uri, LoadDocumentOptions options, DocumentMimeTypeSource source, string mimeType) => 
   { 
      // Use default operation 
      DocumentMimeTypeStatus status = DocumentFactory.MimeTypes.GetDocumentStatus(uri, options, source, mimeType); 
      string mimeTypeValue = mimeType != null ? mimeType : "[null]"; 
      DocumentFactory.MimeTypes.GetStatus(mimeTypeValue); 
      Console.WriteLine(string.Format("  ** Whitelist url:{0} source:{1} mimeType:{2} status:{3}", uri.ToString(), source, mimeTypeValue, status)); 
 
      return status; 
   }; 
 
   DocumentFactory.MimeTypes.UserGetDocumentStatus = userGetDocumentStatus; 
 
   // Load the documents, by default we should load all of them 
   Console.WriteLine("Everything should load OK"); 
   LoadDocuments(urls); 
 
   // Now, disable loading everything except PDF and TIFF and try again 
   Console.WriteLine("Disabling everything except PDF and TIFF"); 
   DocumentFactory.MimeTypes.DefaultStatus = DocumentMimeTypeStatus.Denied; 
   DocumentFactory.MimeTypes.Entries.Add("application/pdf", DocumentMimeTypeStatus.Allowed); 
   DocumentFactory.MimeTypes.Entries.Add("image/tiff", DocumentMimeTypeStatus.Allowed); 
   Console.WriteLine("Only PDF and TIFF should be loaeded"); 
   LoadDocuments(urls); 
 
   // Reset 
   DocumentFactory.MimeTypes.UserGetDocumentStatus = null; 
} 
 
private static void LoadDocuments(string[] urls) 
{ 
   var loadDocumentOptions = new LoadDocumentOptions(); 
 
   foreach (var url in urls) 
   { 
      Console.WriteLine(" Loading " + url); 
      using (var document = DocumentFactory.LoadFromUri(new Uri(url), loadDocumentOptions)) 
      { 
         if (document != null) 
            Console.WriteLine("  is Loaded"); 
         else 
            Console.WriteLine("  cannot be loaded"); 
      } 
   } 
} 
 
import java.io.File; 
import java.io.FileOutputStream; 
import java.io.IOException; 
import java.net.MalformedURLException; 
import java.net.URI; 
import java.net.URISyntaxException; 
import java.net.URL; 
import java.nio.file.Files; 
import java.nio.file.Paths; 
import java.util.ArrayList; 
import java.util.Calendar; 
import java.util.List; 
import java.util.concurrent.Callable; 
import java.util.concurrent.ExecutorService; 
import java.util.concurrent.Executors; 
import java.util.concurrent.Future; 
import java.util.regex.Pattern; 
 
import org.junit.*; 
import org.junit.runner.JUnitCore; 
import org.junit.runner.Result; 
import org.junit.runner.notification.Failure; 
import static org.junit.Assert.*; 
 
import leadtools.*; 
import leadtools.annotations.engine.*; 
import leadtools.barcode.*; 
import leadtools.caching.*; 
import leadtools.codecs.*; 
import leadtools.document.*; 
import leadtools.document.DocumentMimeTypes.UserGetDocumentStatusHandler; 
import leadtools.document.converter.*; 
import leadtools.document.writer.*; 
import leadtools.ocr.*; 
 
 
public void documentMimeTypesExample() throws URISyntaxException { 
   // Documents to try and load 
   String[] urls = { 
         "https://demo.leadtools.com/images/pdf/leadtools.pdf", 
         "https://demo.leadtools.com/images/tiff/ocr.tif", 
         "https://demo.leadtools.com/images/png/pngimage.png" 
   }; 
 
   // Setup a callback for logging 
   // DocumentMimeTypes & DocumentMimeTypes.UserGetDocumentStatusHandler reference 
   UserGetDocumentStatusHandler userGetDocumentStatus = new UserGetDocumentStatusHandler() { 
 
      @Override 
      public DocumentMimeTypeStatus userGetDocumentStatus(URI uri, LoadDocumentOptions options, 
            DocumentMimeTypeSource source, String mimeType) { 
         // Use default operation 
         DocumentMimeTypeStatus status = DocumentFactory.getMimeTypes().getDocumentStatus(uri, options, source, 
               mimeType); 
         String mimeTypeValue = mimeType != null ? mimeType : "[null]"; 
         DocumentFactory.getMimeTypes().getStatus(mimeTypeValue); 
         System.out.println(String.format("  ** Whitelist url:%s source:%s mimeType:%s status:%s%n", uri.toString(), 
               source, mimeTypeValue, status)); 
 
         return status; 
      } 
 
   }; 
 
   DocumentFactory.getMimeTypes().setUserGetDocumentStatus(userGetDocumentStatus); 
 
   // Load the documents, by default we should load all of them 
   System.out.println("Everything should load OK"); 
   loadDocuments(urls); 
 
   // Now, disable loading everything except PDF and TIFF and try again 
   System.out.println("Disabling everything except PDF and TIFF"); 
   DocumentFactory.getMimeTypes().setDefaultStatus(DocumentMimeTypeStatus.DENIED); 
   DocumentFactory.getMimeTypes().getEntries().put("application/pdf", DocumentMimeTypeStatus.ALLOWED); 
   DocumentFactory.getMimeTypes().getEntries().put("image/tiff", DocumentMimeTypeStatus.ALLOWED); 
   System.out.println("Only PDF and TIFF should be loaded"); 
   loadDocuments(urls); 
 
   // Reset 
   DocumentFactory.getMimeTypes().setUserGetDocumentStatus(null); 
 
} 
 
private void loadDocuments(String[] urls) throws URISyntaxException { 
   LoadDocumentOptions loadDocumentOptions = new LoadDocumentOptions(); 
   for (String url : urls) { 
      System.out.println(" Loading " + url); 
      LEADDocument document = DocumentFactory.loadFromUri(new URI(url), loadDocumentOptions); 
      if (document != null) 
         System.out.println("  is Loaded"); 
      else 
         System.out.println("  cannot be loaded"); 
   } 
} 
Requirements

Target Platforms

Help Version 23.0.2024.2.29
Products | Support | Contact Us | Intellectual Property Notices
© 1991-2024 LEAD Technologies, Inc. All Rights Reserved.

Leadtools.Document Assembly
Products | Support | Contact Us | Intellectual Property Notices
© 1991-2023 LEAD Technologies, Inc. All Rights Reserved.