Standardize V2 (Legacy)

post

https://app.docupipe.ai/v2/standardize/batch

Standardize a batch documents, either by passing a list of Document IDs or by passing a dataset name. Pass a schemaId to standardize the documents using a specific structure, or leave it empty to create an ad-hoc structure as the AI sees fit. Standardization handles lists (arrays) by splitting documents into smaller sub-documents behind the scenes - the AI will do its best to decide how and when it is appropriate to split.

You can specify certain parameters, by default they are left to auto which lets the AI decide.

displayMode - Controls how the AI sees the document. The options are:
- auto - Automatically determine the best mode based on the document content.
- spatial - Represent text in the document according to its spatial layout.
- sections - Represent the document as a list of sections (paragraphs, tables, images, etc.) as seen in the web UX.
splitMode - Controls how the AI splits the document into sub-documents. The options are:
- auto - Automatically determine the best mode based on the document content.
- all - Split the document into single-page sub-documents, so each page is handled separately.
- never - Do not split the document at all, so the entire document is handled as a single unit. This can lead to poor performance for long documents, or documents with lots of dense data that needs to be extracted.
effortLevel - Controls how much effort the AI puts into the standardization. The options are:
- standard - Use the standard effort level.
- high - Use the high effort level, which takes longer but can produce better results. Costs +2 credits per page.

Note: The guidelines field has a maximum length of 50,000 characters.

Body Params

documentIds

array of strings

required

List of document IDs to be standardized, up to 100 per batch.

Document IDs*

schemaId

string

Unique identifier of the schema to be used for standardization - if not provided, one will be inferred.

guidelines

string

Guidelines to apply to the schema when standardizing. If this is provided, it will override the schema guidelines.

useMetadata

boolean

Defaults to false

Whether to use metadata during standardization.

displayMode

string

enum

Defaults to auto

Advanced Feature Mode of display to run. The options are: auto: AI decides how to display the document (default) spatial: Display text spatially, as it appears in the document sections: Display text from top to bottom as sections, with tables appearing as markdown image: Display as an image, accompanied by section view

Allowed:

splitMode

string

enum

Defaults to auto

Advanced Feature Mode of splitting to run. Splitting is used to extract array fields efficiently. The options are: auto: AI decides how to split the document (default) never: Never split the document (this could lead to errors or poor performance for large documents) all: Split the document into individual pages

Allowed:

effortLevel

string

enum

Defaults to standard

Advanced Feature Level of effort to run. The options are: standard: Standard effort level (default) high: High effort level, for more difficult documents

Allowed:

stdVersion

number

enum

Defaults to 2.2

Version of the standardization job. Options: 2.0, 2.1, 2.2 (default, stable), 2.3 (experimental, higher quality but may have runtime instability).

Allowed:

pages

array of arrays of integers

Advanced Feature For every document, list of all pages that you want want to standardize. Page numbers are zero-indexed positions within each uploaded document. If not provided, the entire document will be standardized.

Target Page Ranges

timeout

integer

The job timeout (in seconds) for webhook error reporting

Responses

200Successful Response

400Bad Request

402Payment Required

404Not Found

422Unprocessable Entity