Standardization Parameters

Understanding Standardization

Every document you upload goes through Parsing stage, and then optionally may also go through a Standardization stage.

Parsing captures the raw ingredients—text, layout, tables, checkboxes, and images.
Standardization turns that parsed content into structured data, usually based on a Schema so downstream tools always see the same fields.

Uploading a document always produces a Parsing result. Running a Standardization job on top of that result gives you validated fields that can flow into workflows and integrations.

Working With Schemas

A Schema defines the exact fields, data types, and validation rules you expect in the output. The Quick Start Guide walks through creating one, if you haven't already made one.

Almost all use cases are better served by using a schema, but if you have no idea what sort of document you're up against and just want to condense it into a useful set of fields, you may standardize without a schema.

Parameter Reference

Standardization jobs expose a few controls so you can balance accuracy, speed, and credit usage.

Effort Level

Effort Level trades more credits and time for improved accuracy and reasoning:

standard (default) is the fastest option for clean, predictable documents.
high uses a more capable model and adds validation passes.
extended goes deepest on long or tricky files where fields span many pages

Display Mode

Determines how the document is presented to the AI:

Spatial keeps approximate page layout so positional cues survive.
Sections streams the viewer output from top to bottom, emitting Markdown tables.
Image shares pixels alongside text—ideal for handwriting, signatures, or complex tables.
Auto lets DocuPipe choose for you; stick with this unless you have a specific requirement.

Split Mode

Large files can be split so the AI works on smaller chunks. You can split a document yourself or let DocuPipe decide.

All splits aggressively (often page-by-page). Use this for repetitive forms but avoid it when a field spans multiple pages.
Never keeps the file intact so the AI can use full-context cues. Works best for short (1–10 page) documents.
Auto balances both approaches and is the default for most workflows.

Supported Languages

DocuPipe supports 100+ languages. We support print and handwriting recognition for the languages below.

English, Spanish, French, Hebrew, German, Italian, Portuguese, Chinese, Japanese, Korean, Russian, Arabic, Thai

Additional Languages (100+)

Hindi
Bengali
Vietnamese
Indonesian
Turkish
Polish
Dutch
Ukrainian
Persian
Tamil
Telugu
Urdu
Marathi
Punjabi
Gujarati
Malay
Romanian
Greek
Czech
Hungarian
Swedish
Filipino
Danish
Norwegian
Finnish
Bulgarian
Serbian
Croatian
Slovak
Slovenian
Lithuanian
Latvian
Estonian
Catalan
Albanian
Malayalam
Afrikaans
Nepali
Georgian
Armenian
Swahili
Kurdish
Pashto
And more