Quick Start (5 minutes)
The heart of DocuPipe is its ability to convert any document to a standard output that has consistently defined fields. We call the extracted result a Standardization.
In order to standardize a document into consistent results, we need to first decide what structure and rules we want to have in place. We call this set of definitions a Schema. Think of a schema as a bunch of slots for information you expect to find. For example, for a rental lease, you might want to extract monthly amount, or lease end date.
DocuPipe makes it very easy to define how you want to understand your documents with minimal effort.
The flow comprises 3 parts:
- Upload a document
- Define what you want to extract from this document type. This generates a Schema
- Extract results from many documents, generating extractions with consistent structure. We call that result a Standardization
Are you a developer? Check out our developer getting started guide
Upload Documents
This tutorial will follow along a toy problem of standardizing rental leases. You can follow along with our example, or extract a completely different document.
Here's an example lease, which you can use to follow along this guide.
First order of business would be to upload the document. If you're a developer, and want to upload using our API, check out our API Getting Started Guide. We'll stick to the dashboard. Hit "upload" and submit the document.
Building a Schema
Building a schema is the heart of DocuPipe. You typically select 1-20 documents of a certain type, and describe in plain language what you want to extract from this kind of document.
For example, go to the documents tab and select the document you uploaded:
Hit "Create Schema" (the purple button), and type in some instructions for how you want to understand rental leases.
You can be very thorough and explain what you want. For this demo we will be lazy and vague on purpose: "Extract the renter information, and the lease terms. Extract nothing else".
Hit Next and submit. After a while you will get a schema, which sets the slots for how documents will be extracted.
Let's take a quick look at what our Schema looks like.
It's typically much easier to understand a schema by looking at the result of running it on a document and generating a Standardization. By default, DocuPipe automatically generates a new standardization for all the documents you've selected to participate in the schema creation. Click into standardizations and select your result.
Examining the Extracted Standardization
Click on a Standardization in the Standardization Tab. You will see a visually organized representation of your extracted Standardization.
On a technical level, each standardization is a JSON. You can click on the wordJSON to see it in its raw form, and also hit download, and also download it in one of many formats:
- JSON
- Excel
- XML
If you want to download many standardizations at once, you can multi-select them and download as a CSV.
Standardizing Many Documents Using the Same Schema
Once your schema is ready, you can apply it to millions of documents, generating a Standardization for each, with high reliability and a repeatable data structure.
All you need to do to standardize the documents is upload more documents, select the documents, and hit the green standardize button.
When you hit Standardize, you will be prompted to pick what schema you want to run on your selected documents.
Select the relevant schema, hit Standardize. A modal will pop up asking you for optional parameters.
You can read more about the standardization options under Extraction Basics.
Next Steps
There's plenty more to DocuPipe API. You can:
- Classify documents by type
- Split a long document into smaller atomic sub-documents using AI to intelligently decide where one ends and another begins.
- Generate a Visual Review of a standardization to see exactly what pixels justify every decision made by our AI.
- Build a Workflow to automate a sequence of events (e.g. upload -> classify -> standardize) in a single call. See Code Example if you're a developer.
- Developers: Use Webhooks to efficiently receive output payloads.
- Non-developers: Use our No Code Integration to orchestrate data ingestion and processing to 5000+ destinations without writing a line of code.
Updated about 7 hours ago
