Workflow: Split -> Classify -> Extract

Overview

A common paradigm for document splitting is a process where you first split a document, get a bunch of sub-documents from it, and want to understand each sub-document with its own schema.

This guide assumes:

  1. You've already configured a Class for every sub-document you want to extract. You can read Configuring Classes to see how to define classes.
  2. You already have a Schema for each sub-document you're interested in standardizing. See Quick Start (5 minutes) on how to make a Schema.

Example - Rent Lease Extraction

In the Previous Page we've broken up a lease into sub-components and found we can get lease terms separate from lease addendums. Browsing through the addendum, we see there's stuff in there like pet policies, and various restrictions on the apartment use. Let's quickly create a schema for it. We've gone ahead and made a schema for Addendums that looks like this:

And another schema for the lease terms, which looks like this:

Now we want to run this entire contract through a Workflow that orchestrates splitting the document, finding only the addendum and lease agreement, and normalizing each with the corresponding schema.

Building a Workflow

Go to the Workflows tab

Hit +Workflow and select the Parsing -> Split -> Classify -> Standardize workflow type. Select how you want classes to map into schemas (e.g. the rental Lease class goes to Rent Schema V2, and the lease addendums go to Addendum Schema)

💡

Notice that you can in principle map multiple classes to the same schema (for example: you can take both a doctor's note and a lab report, and extract only the patient name and insurance carrier with the same schema).

Using your Workflow

Now you can upload a new document straight into the workflow. Click on the green Apply button

This will let you upload a new document directly into the workflow. If you're a developer, you can also upload a document into a workflow using our API by calling Submit a Document for Processing with {workflowId: yourWorkflowId}

Monitor Progress

  1. Track progress for each workflow run from the Jobs tab.
  2. You will see an Upload job, followed by a Split job. When that's done you will see multiple documents populate in the Documents tab, at which point you will see a new Classify job for every sub-document, and finally a Standardize job for each matching document type. When the workflow completes, new entries appear in the Standardization tab - one for each sub-document whose class you're mapping to a schema.
  3. If you need downstream systems to react automatically, subscribe to the standardization.processing.success webhook event. Learn more in the Webhooks Guide. Hint: When you upload a document programmatically, it's often a good idea to set metadata, for example {"metadata": {"uid": "your-input-document-uid"}. This way when your standardization event fires, you know what input document it relates to.