Extraction Basics

Understanding Standardization

Every document you upload goes through two stages:

  1. Parsing captures the raw ingredients—text, layout, tables, checkboxes, and images.
  2. Standardization turns that parsed content into structured data, usually based on a Schema so downstream tools always see the same fields.

Uploading a document always produces a Parsing result. Running a Standardization job on top of that result gives you validated fields that can flow into workflows and integrations.

Working With Schemas

A Schema defines the exact fields, data types, and validation rules you expect in the output. The Quick Start (5 minutes) walks through creating one.

Use a schema when you want consistent fields and guardrails. You can also run schema-less jobs for exploratory work, but expect higher variance in structure and naming.

Parameter Reference

Standardization jobs expose a few controls so you can balance accuracy, speed, and credit usage.

Display Mode

Determines how the document is presented to the AI:

  • Spatial keeps approximate page layout so positional cues survive.
  • Sections streams the viewer output from top to bottom, emitting Markdown tables.
  • Image shares pixels alongside text—ideal for handwriting, signatures, or complex tables.
  • Auto lets DocuPipe choose for you; stick with this unless you have a specific requirement.

Split Mode

Large files can be split so the AI works on smaller chunks. You can split a document yourself or let DocuPipe decide.

  • All splits aggressively (often page-by-page). Use this for repetitive forms but avoid it when a field spans multiple pages.
  • Never keeps the file intact so the AI can use full-context cues. Works best for short (1–10 page) documents.
  • Auto balances both approaches and is the default for most workflows.

Effort Level

Effort Level trades credits for deeper reasoning:

  • standard (default) is the fastest option for clean, predictable documents.
  • high uses a more capable model and adds validation passes.
  • extended goes deepest on long or tricky files where fields span many pages.

Need tactics beyond these basics? Explore the advanced extraction guides referenced throughout the Document Extraction section.