Standardization Basics
How DocuPipe turns documents into structured data, and the one setting you control when you run a Standardization
Understanding Standardization
Every document you upload goes through a Parsing stage, and then optionally a Standardization stage.
- Parsing captures the raw ingredients - text, layout, tables, checkboxes, and images.
- Standardization turns that parsed content into structured data, usually based on a Schema so downstream tools always see the same fields.
Uploading a document always produces a Parsing result. Running a Standardization job on top of that result gives you validated fields that can flow into workflows and integrations.
Working With Schemas
A Schema defines the exact fields, data types, and validation rules you expect in the output. The Quick Start Guide walks through creating one, if you haven't already made one.
Almost all use cases are better served by using a schema, but if you have no idea what sort of document you're up against and just want to condense it into a useful set of fields, you may standardize without a schema.
Standardizing with a schema runs on our latest extraction engine. Schemaless standardization still works, and produces a sensible structure DocuPipe infers for you.
The One Setting You Control
DocuPipe's extraction engine reads your document page by page, decides its own strategy, and handles page layout and document splitting automatically. There are no display mode or split mode settings to pick - the engine figures those out for you.
The one control you have is the Effort Level:
standard(default, 2 credits per page) handles most documents - clean invoices, forms, and reports.high(4 credits per page) uses more capable models for long, dense, or complex documents where maximum accuracy matters.
Start with standard and re-run only the documents that come back with missed or wrong fields on high.
For a fuller walkthrough of how the engine works, page attribution, and when high effort is worth it, see the V3 Extraction Engine guide.
Supported Languages
DocuPipe supports 100+ languages. We support print and handwriting recognition for the languages below.
English, Spanish, French, Hebrew, German, Italian, Portuguese, Chinese, Japanese, Korean, Russian, Arabic, Thai
Additional Languages (100+)
- Hindi
- Bengali
- Vietnamese
- Indonesian
- Turkish
- Polish
- Dutch
- Ukrainian
- Persian
- Tamil
- Telugu
- Urdu
- Marathi
- Punjabi
- Gujarati
- Malay
- Romanian
- Greek
- Czech
- Hungarian
- Swedish
- Filipino
- Danish
- Norwegian
- Finnish
- Bulgarian
- Serbian
- Croatian
- Slovak
- Slovenian
- Lithuanian
- Latvian
- Estonian
- Catalan
- Albanian
- Malayalam
- Afrikaans
- Nepali
- Georgian
- Armenian
- Swahili
- Kurdish
- Pashto
- And more
