Generating a Visual Review

Overview

When you generate a Standardization, you get pure data, without knowing its exact source in the input document. Sometimes, you care not only about extracted information like totalAmount: 50,000 or pets allowed: False, but rather you care to know where that information came from, and be able to a human the loop to review the results.

A Review object is identical to a Standardization, except every extraction now also has:

  1. An exact localization, which we show as highlighted yellow-marker text. Under the hood that means we tie the prediction to a page and a bounding box - an exact location on a specific page where you see the evidence for a prediction
  2. A confidence score, set to low/medium/high - reflecting how sure we are that this extraction is valid

Common Use Cases

  1. The most common use case for a review object is to put a human reviewer in the loop. A human can go over a an extraction, click on key fields, visually inspect exactly where they came from, and reject/edit/approve the result. While most users benefit the most from DocuPipe with full automation, you do have this tool to build human-validated workflows.
  2. Sometimes, the use case isn't to validate results but to convince a third party. For example, maybe you're building a real estate AI agent, that needs to tell the user that their rental lease doesn't allow dogs. You want to be able to visually show the exact language used in the contract that justifies the claim that we extracted as Pets Allowed: False
  3. Review objects are often used to facilitate exploration of data and empower a human to go over a large dataset. For example, a lawyer might want to show that a plaintiff had back pain on some date before an accident has occured - they make a schema search for Back Pain Reported: True, and then want to see where in a very long PDF we actually see the evidence of back pain.

Creating a Visual Review

Once you have a Standardization you can generate a review object from it. You can do this from the Standardization Tab on you dashboard.

Select the Standardization results you want want to review, and click on the Create Review button at the top.

This will create review jobs, that can take 10 seconds to generate for a one pager, and potentially minutes for longer and more complex documents. You can follow along your job's progress in the Jobs Tab. When the jobs are done, you can see the review objects in the Review Tab.

Let's click on our extraction for Lease Addendum:

On the left hand side of the viewer, you can scroll through and click on any extraction result.

In the screenshot above, we clicked on the item relating to charcoal grills not being allowed

The right hand side automatically scrolls and highlight with a yellow-marker your current selection. You can see the viewer automatically scrolls to the right page and yellow-markers the legal language that says charcoal grills are not allowed.

Modifying a Review Object

If an extraction is incorrect, you can modify the review object in whatever way you like. Specifically you can:

  1. Remove fields
  2. Change field values
  3. Add more items to arrays

If you the extraction is correct, or you're done making change to it - hit the Finalize button, which will move the Review object into a Verified State.

Notice that any modifications applied to the review object only modify that object - the original standardization is an immutable object and does not change.

📘

For developers: When you approve or reject a Review, our Webhooks fire a corresponding event, allowing you to build automations around ingesting only verified review objects into downstream application

Limitations

Currently review operations work best when the input is limited in size:

  1. Document size should ideally be less than 20 pages. Consider using our AI Split operations to cut down input document size if you're dealing with very large documents
  2. The amount of extraction items should be less than 100 items

If your use case requires more than these requirements, please contact support.