The LEADTOOLS Document Library supports redacting sensitive information on documents with view and convert options.
DocumentRedactionOptions redactionOptions = leadDocument.Annotations.RedactionOptions;
By default, the value is null. Therefore, when loading from a URI or creating new documents the application creates a new instance of DocumentRedactionOptions and sets it first, as follows:
DocumentRedactionOptions redactionOptions = new DocumentRedactionOptions();
// Modify redactionOptions as needed, then set it:
leadDocument.Annotations.RedactionOptions = redactionOptions;
DocumentRedactionOptions contains the following properties:
Mode. A DocumentRedactionMode property type that specifies how to redact the document when viewed or converted. The default value is None, which means that no redaction is to be performed on the document. Set the value to Apply or RasterizeThenApply to perform redaction.
IntersectionPercentage. An integer representing the percentage of intersection to use when redacting text characters.
This example shows how to redact a document for viewing and redaction using the LEADTOOLS Document Viewer Demo. Each step is explained with code snippets showing how to set the options to perform the actions.
This screenshot shows a loaded account balance PDF file:
This document contains sensitive information that needs to be redacted. The information is the Account Number, with a value of 12345678-123, shown highlighted below:
Use the Annotation Redaction object (AnnRedactionObject) from the annotation menu to draw a redaction around the account number:
It is also possible to use the (AnnTextRedactionObject) to perform the same action.
Save the document using File/Save to cache, and then re-open it with the same document ID. Notice that the redaction object is visible and covers the account number value. However, the redaction can still be selected, or moved or deleted to uncover the sensitive information:
This is because the redaction object is a live object and is only rendered on top of the document. The object is also included in the Annotations Objects list and can be interacted with just like any normal annotation object.
Perform a select text operation on the text and notice that the sensitive information can be selected and copied to the clipboard as well:
When exporting this document to an external PDF or DOCX, the information will be visible as well.
This behavior occurs when the value of LEADDocument.Annotations.RedactionOptions is null or when it is using the default values:
.ConvertOptions.Mode = AnnotationsRedactionMode.None;
.ViewOptions.Mode = AnnotationsRedactionMode.None;
In this mode, no special processing is performed on the document while it is being viewed. To perform true redaction, select Annotations/Redaction Options from the menu and change the Redaction Mode for View and Convert as follows, and click OK:
Setting these options calls the following:
// First, determine if this document has redaction options. If not, create a default one.
if (leadDocument.Annotations.RedactionOptions != null)
leadDocument.Annotations.RedactionOptions = new DocumentRedactionOptions();
// Now, set the options to apply the redactions on view and convert
leadDocument.Annotations.RedactionMode.ViewOptions.Mode = AnnotationsRedactionOptions.Apply;
leadDocument.Annotations.RedactionMode.ConvertOptions.Mode = AnnotationsRedactionOptions.Apply;
After clicking OK, note that the document is re-loaded from the cache and now looks as follows:
The first thing to notice is the redaction object has been removed from the list of Objects, although it is still visible on the document and the thumbnail. Trying to interact with the object yields no actions (it is a physical part of the page now and cannot be moved or deleted).
Perform a select text operation on the text and notice that the sensitive information is redacted with the selected character (the default of
Redaction is performed because the value of leadDocument.Annotations.RedactionMode.ViewOptions.Mode is set to AnnotationsRedactionOptions.Apply. This instructs the library to apply the redaction objects on each page inside the document itself (or on the server) before passing it to the viewer. DocumentPage.GetImage, DocumentPage.GetSvg and DocumentPage.GetThumbnailImage all perform the action and return a redacted image that does not contain any sensitive information. For a raster image, the redaction is rendered on the pixels' data. For SVG, the redaction is performed by removing the characters from the text elements and replacing them with the redaction character.
Select File/Export to save this document to an external PDF or DOCX file. Notice that the resulting document is redacted as well and does not contain any sensitive information. The following screenshot shows the output document displayed in a Chrome browser with the account number selected:
Notice that the document stays "text" and only the redacted area is modified. This behavior is performed because the value of leadDocument.Annotations.RedactionMode.ConvertOptions.Mode is set to AnnotationsRedactionOptions.Apply. This instructs the library to apply the redaction objects on each page inside the document itself (or on the server) before passing it to the document converter. DocumentPage.GetImage, DocumentPage.GetSvg, DocumentPage.GetThumbnailImage, and DocumentPage.GetText all perform the action and return a redacted image that does not contain any sensitive information, similar to the view options described above.
The annotations redaction object still lives inside the document. With these options set, it is applied on demand and removed from the annotations list of the document when viewed or converted. Go back into the document viewer and select Annotations/Redaction Options again and set the values of View/Redaction Mode and Convert/Redaction Mode back to the default value of None. After clicking OK, notice that the viewer reloads the document and now the redaction annotation object is live again and can be interacted with. Selecting the text of the account number shows that the original un-redacted value has been restored:
DocumentRedactionMode.ApplyThenRasterize works by applying the redaction into SVG pages and then rasterizing the whole page before returning it to the viewer or the converter. For raster pages, DocumentRedactionMode.ApplyThenRasterize](../dox/documentredactionmode.html) works the same way as DocumentRedactionMode.Apply.
When selecting ApplyThenRasterize for viewing or conversion, only those SVG pages containing redaction objects are rasterized. All other pages stay as SVG.
The default replacement character (AnnotationsRedactionOptions.ReplaceCharacter) is
*. To remove redacted text from the document altogether, use a value of 0.
Redaction is also applied when converting the document to a raster format such as TIF or raster PDF. The redactions are rendered over the page's pixel data. This is similar to DocumentConverterAnnotationsMode.Overlay.
Redactions are applied even if the area underneath on the page does not contain any text information. This can be used to apply redaction to sensitive area having images or other shapes.
Internally, document redaction applies the values during DocumentPage.GetImage, DocumentPage.GetSvg, DocumentPage.GetThumbnailImage, and DocumentPage.GetText. The modified data then returns to the caller.
Document library support for text parsing from any document using DocumentPage.GetText opens the possibility for end user applications to perform dynamic redaction on sensitive data by parsing the text and performing intelligent analysis. For a very basic example, refer to DocumentAnnotations.RedactionOptions.