V3 Extraction Engine
DocuPipe's next-generation agentic extraction engine - higher quality, fewer knobs, and smarter document handling
What's New
V3 is a ground-up rebuild of how DocuPipe extracts data from your documents. Instead of a fixed pipeline that requires you to pick the right configuration, V3 uses an agentic AI that reads your document page by page - deciding its own strategy, self-correcting mistakes, and adapting to whatever it finds.
The result: better extraction quality with less configuration on your end.
V3 vs V2 at a Glance
| V2 | V3 | |
|---|---|---|
| How it works | Fixed pipeline with manual knobs | Agentic AI that reads and reasons page by page |
| Configuration | You choose display mode, split mode, effort level | Just pick standard or high effort - everything else is automatic |
| Quality | ~89% accuracy on our eval suite | ~95% accuracy on the same suite |
| Page attribution | Not available | Know which page each extracted field came from |
| Schema required | Optional | Required (schemaless extraction stays on V2) |
No More Manual Knobs
In V2, you had to guess the right combination of display mode (spatial, sections, image), split mode (auto, never, all), and effort level (standard, high, extended) for your documents. Pick wrong, and extraction quality suffered.
V3 removes all of that. The agent inspects each page and automatically decides how to process it. You just tell it what to extract (your Schema) and it figures out the rest.
Effort Levels
V3 offers two effort levels that control which AI models power the extraction:
| Effort | Credits per Page | Best For |
|---|---|---|
| Standard (default) | 2 | Most documents - clean invoices, forms, reports |
| High | 4 | Complex or dense documents where maximum accuracy matters |
Both use the same agentic architecture. The difference is that high uses more capable (and more expensive) models in the extraction loop.
V2's extended effort level (5 credits/page) is replaced by V3's agentic approach. V3 standard already outperforms V2 extended on most documents, at lower cost.
How V3 Processes Your Document
- Receives your Schema and understands what fields to look for
- Reads through the document page by page, extracting fields as it goes
- Decides when to move on - staying on dense pages longer and advancing past simple ones
- Validates and post-processes the result against your schema
Page Attribution
V3 tracks which page each extracted field came from. This is available as a pageMap on the Standardization result - a mapping from field paths to 1-indexed page numbers.
For example, if your schema extracts vendor.name from page 1 and lineItems.0.description from page 3, the pageMap would reflect that. This is useful for building UIs that jump to the source page when a user clicks on a field.
What Stays the Same
- Schemas work exactly as before. No changes needed to your existing schemas.
- Guidelines still apply and are passed to the V3 agent.
- Output format is the same JSON structure you're used to.
- Webhooks fire the same
standardization.processed.successandstandardization.processed.errorevents. - Downloads (JSON, Excel, XML, CSV) all work the same way.
- Credit pricing for standard effort is unchanged at 2 credits per page.
Using V3
From the Dashboard
V3 is now the default when you run a Standardization from the dashboard. Select your documents, click Standardize, choose your schema, and optionally set the effort level to high for complex documents.
From the API
Use the POST /v3/standardize endpoint:
{
"documentId": "your-document-id",
"schemaId": "your-schema-id",
"effortLevel": "standard"
}Optional parameters:
guidelines- additional extraction instructionsuseMetadata- include document metadata in extraction contextpages- extract only specific pages (0-indexed)
The response includes a jobId and standardizationId for tracking progress.
V2 endpoints remain available and are not being removed. If you have integrations using
POST /standardizeorPOST /standardize/batch, they will continue to work.
When to Use High Effort
Standard effort handles most documents well. Consider switching to high when:
- Documents are long (10+ pages) with dense tables spanning multiple pages
- You're seeing missed fields on complex layouts
- The document has unusual formatting that requires deeper reasoning
- Maximum accuracy is more important than cost
You can always start with standard and selectively re-run problem documents on high.
Supported Languages
V3 supports the same 100+ languages as V2, including print and handwriting recognition for English, Spanish, French, Hebrew, German, Italian, Portuguese, Chinese, Japanese, Korean, Russian, Arabic, and Thai.
FAQ
Do I need to change my schemas for V3? No. Existing schemas work as-is with V3.
Can I still use V2? Yes. V2 endpoints are not being removed. Schemaless extractions automatically use V2.
Is V3 slower than V2? V3 may take slightly longer on multi-page documents since it processes pages sequentially. Single-page documents are comparable in speed. The quality improvement more than offsets any time difference.
Does V3 cost more? Standard effort is the same price as V2 standard (2 credits/page). High effort is 4 credits/page, the same as V2 high. V2's extended tier (5 credits/page) has no V3 equivalent because V3 standard already exceeds V2 extended quality for most documents.
Updated about 4 hours ago
