Extraction Basics
Understanding Standardization
Every document you upload goes through two stages:
- Parsing captures the raw ingredients—text, layout, tables, checkboxes, and images.
- Standardization turns that parsed content into structured data, usually based on a Schema so downstream tools always see the same fields.
Uploading a document always produces a Parsing result. Running a Standardization job on top of that result gives you validated fields that can flow into workflows and integrations.
Working With Schemas
A Schema defines the exact fields, data types, and validation rules you expect in the output. The Quick Start (5 minutes) walks through creating one.
Use a schema when you want consistent fields and guardrails. You can also run schema-less jobs for exploratory work, but expect higher variance in structure and naming.
Parameter Reference
Standardization jobs expose a few controls so you can balance accuracy, speed, and credit usage.
Display Mode
Determines how the document is presented to the AI:
- Spatial keeps approximate page layout so positional cues survive.
- Sections streams the viewer output from top to bottom, emitting Markdown tables.
- Image shares pixels alongside text—ideal for handwriting, signatures, or complex tables.
- Auto lets DocuPipe choose for you; stick with this unless you have a specific requirement.
Split Mode
Large files can be split so the AI works on smaller chunks. You can split a document yourself or let DocuPipe decide.
- All splits aggressively (often page-by-page). Use this for repetitive forms but avoid it when a field spans multiple pages.
- Never keeps the file intact so the AI can use full-context cues. Works best for short (1–10 page) documents.
- Auto balances both approaches and is the default for most workflows.
Effort Level
Effort Level trades credits for deeper reasoning:
standard(default) is the fastest option for clean, predictable documents.highuses a more capable model and adds validation passes.extendedgoes deepest on long or tricky files where fields span many pages.
Need tactics beyond these basics? Explore the advanced extraction guides referenced throughout the Document Extraction section.
Updated about 4 hours ago
