Standardize V2 (Legacy)

Standardize a batch documents, either by passing a list of Document IDs or by passing a dataset name. Pass a schemaId to standardize the documents using a specific structure, or leave it empty to create an ad-hoc structure as the AI sees fit. Standardization handles lists (arrays) by splitting documents into smaller sub-documents behind the scenes - the AI will do its best to decide how and when it is appropriate to split.

You can specify certain parameters, by default they are left to auto which lets the AI decide.

  1. displayMode - Controls how the AI sees the document. The options are:
    • auto - Automatically determine the best mode based on the document content.
    • spatial - Represent text in the document according to its spatial layout.
    • sections - Represent the document as a list of sections (paragraphs, tables, images, etc.) as seen in the web UX.
  2. splitMode - Controls how the AI splits the document into sub-documents. The options are:
    • auto - Automatically determine the best mode based on the document content.
    • all - Split the document into single-page sub-documents, so each page is handled separately.
    • never - Do not split the document at all, so the entire document is handled as a single unit. This can lead to poor performance for long documents, or documents with lots of dense data that needs to be extracted.
  3. effortLevel - Controls how much effort the AI puts into the standardization. The options are:
    • standard - Use the standard effort level.
    • high - Use the high effort level, which takes longer but can produce better results. Costs +2 credits per page.

Note: The guidelines field has a maximum length of 50,000 characters.

Body Params
documentIds
array of strings
required

List of document IDs to be standardized, up to 100 per batch.

Document IDs*
string

Unique identifier of the schema to be used for standardization - if not provided, one will be inferred.

string

Guidelines to apply to the schema when standardizing. If this is provided, it will override the schema guidelines.

boolean
Defaults to false

Whether to use metadata during standardization.

string
enum
Defaults to auto

Advanced Feature Mode of display to run. The options are: auto: AI decides how to display the document (default) spatial: Display text spatially, as it appears in the document sections: Display text from top to bottom as sections, with tables appearing as markdown image: Display as an image, accompanied by section view

Allowed:
string
enum
Defaults to auto

Advanced Feature Mode of splitting to run. Splitting is used to extract array fields efficiently. The options are: auto: AI decides how to split the document (default) never: Never split the document (this could lead to errors or poor performance for large documents) all: Split the document into individual pages

Allowed:
string
enum
Defaults to standard

Advanced Feature Level of effort to run. The options are: standard: Standard effort level (default) high: High effort level, for more difficult documents

Allowed:
number
enum
Defaults to 2.2

Version of the standardization job. Options: 2.0, 2.1, 2.2 (default, stable), 2.3 (experimental, higher quality but may have runtime instability).

Allowed:
pages
array of arrays of integers

Advanced Feature For every document, list of all pages that you want want to standardize. Page numbers are zero-indexed positions within each uploaded document. If not provided, the entire document will be standardized.

Target Page Ranges
integer

The job timeout (in seconds) for webhook error reporting

Responses

Language
Credentials
Header
LoadingLoading…
Response
Click Try It! to start a request and see the response here! Or choose an example:
application/json