# DocuPipe Documentation ## Guides - [Using Workspaces to Separate Enviornments](https://docs.docupipe.ai/docs/using-workspaces-to-separate-enviornments.md): Collaborate with others on the same data. Partition development from production, or differentiate projects with completely separate environments - [Classifying Documents](https://docs.docupipe.ai/docs/classifying-documents.md) - [Configuring Classes](https://docs.docupipe.ai/docs/configuring-classes.md) - [Workflow: Classify -> Extract](https://docs.docupipe.ai/docs/workflows-dashboard.md): Automate upload, classification, and standardization without writing code. - [Standardization Parameters](https://docs.docupipe.ai/docs/extraction-basics.md): Standardizing documents a Schema involves some configuration parameters. Learn here about what they are and how they work - [Quick Start (5 minutes)](https://docs.docupipe.ai/docs/quick-start.md) - [Troubleshooting Extractions](https://docs.docupipe.ai/docs/troubleshooting-extractions.md): What to do when something is missed in a Standardization you've generated - [Splitting a Document](https://docs.docupipe.ai/docs/splitting-a-document.md) - [Workflow: Split -> Classify -> Extract](https://docs.docupipe.ai/docs/workflow-split-classify-extract.md) - [No Code Integration Using Make.com](https://docs.docupipe.ai/docs/no-code-integration-using-makecom.md): No-Code Integration: Use Make to call any DocuPipe API function with the no-code integration - [Getting Started](https://docs.docupipe.ai/docs/getting-started.md) - [Generating a Visual Review](https://docs.docupipe.ai/docs/generating-a-visual-review.md): See exactly where each extraction came from with yellow-marker annotation overlaid on your original document ## API Reference - [Root](https://docs.docupipe.ai/reference/root-1.md) - [Get Account Information](https://docs.docupipe.ai/reference/get_account-1.md): Get information about your account, including the plan name, number of remaining credits, the number of overage credits, and the details of the upcoming invoice. - [Delete Multiple Analyses](https://docs.docupipe.ai/reference/delete_analyses-1.md): Delete multiple analyses at once by providing a list of analysis IDs. - [Retrieve Analysis](https://docs.docupipe.ai/reference/get_analysis-1.md): Retrieve an analysis object by providing the analysis ID, which has the questions and answers. - [List Analyses](https://docs.docupipe.ai/reference/list_analyses-1.md): List all analyses that have been performed. - [Analyze Data](https://docs.docupipe.ai/reference/post_analyze_data-1.md): Analyze multiple documents all at once, either by passing a list of Document IDs or by passing a dataset name. If both are pased, we will use the intersection of the two. Analysis works by passing a list of questions in natural language. If a schemaId is passed, the AI will first use the standardizations of those documents under the provided schema to narrow down which documents are relevant. Only then, it will analyze the documents and provide answers to the questions, along with confidence scores and citations. If a schemaId is not passed, the AI will manually examine all documents, with a limit of 50. When schemaId is passed, the AI can optionally also perform database queries for statistical calculations and answer the questions based on those results. - [Analyze Document](https://docs.docupipe.ai/reference/post_analyze_document-1.md): Analyze a single document by passing a `documentId` and a list of questions in natural language. If the `pages` parameter is provided, the AI will only analyze the specified pages. Poll for the results using the `GET /job/{jobId}` endpoint with the returned jobId. - [Delete a Class](https://docs.docupipe.ai/reference/delete_class-1.md): Delete a class from the taxonomy of classes by providing the class ID. - [List Classes](https://docs.docupipe.ai/reference/list_classes-1.md): List all classes that have been defined in the taxonomy. - [Add a Class](https://docs.docupipe.ai/reference/post_add_class-1.md): Add a new class to the taxonomy of classes, including name and description. - [Classify Documents](https://docs.docupipe.ai/reference/post_classify_batch-1.md): Classify a batch of one or multiple documents all at once by passing a list of document IDs, and an optional list of class IDs to use for classification. If no class IDs are provided, all classes will be used. To use the `unknown` class, either pass its classId ('unknown') or set the includeUnknown flag as True. - [Copy a Class to Another Workspace](https://docs.docupipe.ai/reference/post_copy_classification.md): Duplicate one of your taxonomy classes into a different workspace. Provide the existing classId and the target workspaceId to create a copy with a new classId owned by the destination workspace. - [Delete a Document](https://docs.docupipe.ai/reference/delete_document-1.md): Delete a document that has been previously submitted to DocuPipe for processing. - [Delete Multiple Documents](https://docs.docupipe.ai/reference/delete_documents-1.md): Delete multiple documents that have been previously submitted to DocuPipe for processing. You can provide a list of document IDs to delete multiple documents at once. - [Download OCR URL](https://docs.docupipe.ai/reference/download_ocr_url-1.md): Generate an OCR PDF by providing a document ID. This will add the OCR text into a layer on top of the PDF, allowing you to search the PDF by the OCR text. Returns a presigned URL to download the OCR PDF. This URL is valid for a limited time (e.g., 1 hour) and allows secure access to the OCR PDF stored on DocuPipe. Note that this endpoint only works for documents with PDF file types. - [Download Original URL](https://docs.docupipe.ai/reference/download_original_url-1.md): Generate and retrieve a presigned URL for accessing the original file of a document by its ID. The URL is valid for a limited time (e.g., 1 hour) and allows secure access to the document stored on DocuPipe. - [Retrieve Detailed Processing Result](https://docs.docupipe.ai/reference/get_document_detailed.md): Access the fine grained document parsing result. This representation includes individual word locations on the page - [Get Document Count](https://docs.docupipe.ai/reference/get_document_summary-1.md): Get a count of your documents, including the total number of documents, as well as the list of unique datasets - [Retrieve a Processed Document](https://docs.docupipe.ai/reference/get_document-1.md): Access the analysis results of your submitted document using this endpoint. The `status` field indicates the document's current processing stage, and the `result` field provides the extracted plain text for AI comprehension, as well as more granular structured information such as bounding boxes for detected tables and text blocks. - [Get Schema Proposals](https://docs.docupipe.ai/reference/get_proposed_schemas.md): Get schema proposals for a document by providing the document's ID. The schema proposals are generated by the AI based on the document's content. - [List Dataset Names](https://docs.docupipe.ai/reference/list_datasets-1.md): List all dataset names for the documents you have submitted so far - [List Documents](https://docs.docupipe.ai/reference/list_documents-1.md): List all documents that have been submitted to DocuPipe for processing. You can filter the results by providing a dataset name. - [Submit a Document for Processing](https://docs.docupipe.ai/reference/post_document-1.md): Use this endpoint to submit a document to DocuPipe for processing. You can upload a local file or provide a URL to a remote file. Upon submission, receive a unique `documentId` which you may use to retrieve the document's results, or apply subsequent workflows on it. Max document size is 1500 pages and 3000 MB. You may also provide a `workflowId` to apply a pre-defined workflow to the document. - [Merge Documents](https://docs.docupipe.ai/reference/post_merge.md): Merge multiple documents into a single PDF document. - [Split a Document](https://docs.docupipe.ai/reference/post_split_document.md): Split a document into multiple documents intelligently using AI. If no splitting is needed, no new documents will be created. Otherwise, the new sub-documents will be automatically generated. - [Update Dataset](https://docs.docupipe.ai/reference/update_documents_dataset.md): Update the dataset of a list of documents by providing a list of document IDs and the new dataset name. This operation will update the dataset of all documents and standardizations associated with those documents. - [Health Check Post](https://docs.docupipe.ai/reference/health_check_post.md): Health check endpoint to confirm the service is operational. - [Health Check](https://docs.docupipe.ai/reference/health_check.md): Health check endpoint to confirm the service is operational. - [Delete Jobs](https://docs.docupipe.ai/reference/delete_jobs-1.md): Delete multiple jobs that have been submitted to DocuPipe for processing. You can provide a list of job IDs to delete multiple jobs at once. Since jobs are just a record of events, deleting them will just hide them from you - the actual records will still be stored in the database. For specific jobs that contain actual data, such as Analyze-Document, the data will be deleted. - [Get Job Count](https://docs.docupipe.ai/reference/get_job_summary-1.md): Get a count of your jobs broken down by Job Type. For each job type, you will see the number of jobs and number of credits consumed. The output includes 4 versions of the summary: 1. All time summary with deleted jobs 2. All time summary excluding deleted jobs 3. Count since start_date (defaults to previous billing date, includes deleted jobs) 4. Daily breakdown of credit usage, additionally broken down by job type. To receive this, set include_daily_usage to true. You may pass start_date as an optional query parameter in ISO format (yyyy-mm-dd), and include_daily_usage as a boolean flag. - [Retrieve a Job](https://docs.docupipe.ai/reference/get_job-1.md): Retrieve the details of a specific job by providing the job's ID. This will include the job's status, timestamp, and any other relevant information. - [List Jobs](https://docs.docupipe.ai/reference/list_jobs-1.md): List all jobs that have been submitted to DocuPipe for processing. Every document upload, standardization, or credit-consuming event results in a job. This lets you audit your credit consumption. You can optionally filter the results by providing a date range in the format yyyy-mm-dd. - [Delete Reviews](https://docs.docupipe.ai/reference/delete_reviews-1.md): This endpoint is used to delete multiple review object. You can pass a length 1 list of review IDs to delete a single review. - [Generate a Presigned URL for a Review](https://docs.docupipe.ai/reference/get_presigned_url.md): This endpoint generates a presigned URL containing a signature and expiration time for accessing or acting on a review. - [Retrieve a Review by ID](https://docs.docupipe.ai/reference/get_review_by_id.md): This endpoint is used to retrieve a review object its unique ID - [Retrieve review by standardization ID](https://docs.docupipe.ai/reference/get_standardization_review-1.md): This endpoint is used to retrieve a review object by its associated standardization ID. - [List Reviews](https://docs.docupipe.ai/reference/list_reviews-1.md): This endpoint is used to list all review objects. - [Generate a Visual Review](https://docs.docupipe.ai/reference/post_review_batch-1.md): This endpoint is used to generate a visual review of the standardization results. For every value in the standardization payload, we generate a confidence score and a a list of locations, where a location is page number and x1,y1,x2,y2 bounding box coordinate on that page, designating the top left and lower right corner of the bounding box. This indicates where in the doucment the value was found. - [Update a Review](https://docs.docupipe.ai/reference/update_review.md): This endpoint is used to update a review object with new data or status. - [Delete a Schema](https://docs.docupipe.ai/reference/delete_schema-1.md): Delete a schema by its id. - [Retrieve a Schema](https://docs.docupipe.ai/reference/get_schema-1.md): Retrieve an existing schema by providing the schema's ID. - [List Schemas](https://docs.docupipe.ai/reference/list_schemas-1.md): List all of your schemas. The output here includes the jsonSchema data as well. - [Copy a Schema to Another Workspace](https://docs.docupipe.ai/reference/post_copy_schema.md): Duplicate one of your schemas into another workspace that you administer. Pass the existing schemaId and the api key of the target workspace; the schema content and metadata are cloned as-is and assigned a new schemaId for the destination workspace. - [Edit a Schema](https://docs.docupipe.ai/reference/post_edit_schema.md): Edit a schema by providing a schema ID and the parameters you want to edit. This does not create a new schema, but rather updates the existing schema. Changing the schema name is purely cosmetic, but changing the description and guidelines will affect the behavior of the schema for future standardizations. The things you can edit are: 1. `schemaName` - The name of the schema 2. `description` - The description of the schema 3. `guidelines` - The guidelines for the schema - [AutoGenerate a Schema](https://docs.docupipe.ai/reference/post_schema_autogenerate-1.md): Generate a schema based on a list of documents. Leave the instructions empty if you want the AI to use its best judgment, or provide instructions to indicate your preference to how the schema should be generated. Best results are achieved when you provide a varied list of documents that represent the full range of type of documents you expect to process, and when you provide clear instructions to what you expect the schema to capture and how you want it to be structured. - [Add a New Schema](https://docs.docupipe.ai/reference/post_schema.md): Create a new schema manually by posting a valid JSON schema. The schema should be a valid JSON schema that represents the structured output you want to extract from documents. - [Bulk Download Standardization Excels](https://docs.docupipe.ai/reference/bulk_excel_download.md): Download multiple standardization results as Excel files in a single zip archive. Provide a list of standardization IDs and receive a presigned URL to download a zip file containing all the Excel files. Maximum 50 standardizations per request. The download URL expires after 24 hours. - [Bulk Download Standardization XMLs](https://docs.docupipe.ai/reference/bulk_xml_download.md): Download multiple standardization results as XML files in a single zip archive. Provide a list of standardization IDs and receive a presigned URL to download a zip file containing all the XML files. Maximum 100 standardizations per request. The download URL expires after 24 hours. - [Delete a Standardization](https://docs.docupipe.ai/reference/delete_standardization-1.md): Delete a standardization by providing the standardization ID. - [Delete Multiple Standardizations](https://docs.docupipe.ai/reference/delete_standardizations-1.md): Delete multiple standardizations at once by providing a list of standardization IDs. - [Download Excel URL](https://docs.docupipe.ai/reference/download_excel_url.md): Generate an Excel file from the standardization JSON by providing a standardization ID. All arrays in the JSON will be put in a separate sheet, and all the non-array fields will be put in the main sheet. Returns a presigned URL to download the Excel file. This URL is valid for a limited time (e.g., 1 hour) and allows secure access to the Excel file stored on DocuPipe. - [Get Standardization Count](https://docs.docupipe.ai/reference/get_standardization_summary-1.md): Get a count of your standardizations, including the total number as well as the list of unique schema names - [Retrieve a Standardization XML](https://docs.docupipe.ai/reference/get_standardization_xml.md): Retrieve the standardization results of a document as an XML object. - [Retrieve a Standardization JSON](https://docs.docupipe.ai/reference/get_standardization-1.md): Retrieve the standardization results of a document as a JSON object. - [List Standardizations](https://docs.docupipe.ai/reference/list_standardizations-1.md): Retrieve all standardizations of documents that have been processed using a specific schema. - [Match a standardization to a list of candidates](https://docs.docupipe.ai/reference/match_standardization.md): Use this endpoint to match a standardization to a list of candidates. You can provide a standardization id and a list of candidates. A candidate must have and id and a record which details all its properties. You can optionally provide instructions to clarify the task rules - [Query Standardizations](https://docs.docupipe.ai/reference/post_query.md): Query the documents you have standardized using free form language.The query should be written in plain language, and should describe what you're looking for. e.g "Find all the documents where the rental is above $1000 in San Francisco". The endpoint will return a list of standardizations that match the query, along with AI-generated feedback on the query. The maximum number of standardizations returned is 200. - [Standardize Documents](https://docs.docupipe.ai/reference/post_standardize_batch_v2.md): Standardize a batch documents, either by passing a list of Document IDs or by passing a dataset name. Pass a schemaId to standardize the documents using a specific structure, or leave it empty to create an ad-hoc structure as the AI sees fit. Standardization handles lists (arrays) by splitting documents into smaller sub-documents behind the scenes - the AI will do its best to decide how and when it is appropriate to split. You can specify certain parameters, by default they are left to `auto` which lets the AI decide. 1. `displayMode` - Controls how the AI sees the document. The options are: - `auto` - Automatically determine the best mode based on the document content. - `spatial` - Represent text in the document according to its spatial layout. - `sections` - Represent the document as a list of sections (paragraphs, tables, images, etc.) as seen in the web UX. 2. `splitMode` - Controls how the AI splits the document into sub-documents. The options are: - `auto` - Automatically determine the best mode based on the document content. - `all` - Split the document into single-page sub-documents, so each page is handled separately. - `never` - Do not split the document at all, so the entire document is handled as a single unit. This can lead to poor performance for long documents, or documents with lots of dense data that needs to be extracted. 3. `effortLevel` - Controls how much effort the AI puts into the standardization. The options are: - `standard` - Use the standard effort level. - `high` - Use the high effort level, which takes longer but can produce better results. Costs +2 credits per page. - [Deregister an Endpoint](https://docs.docupipe.ai/reference/delete_endpoint-1.md): Deregister a webhook endpoint for your application, it will stop receiving all events. You can also manage this in our dashboard portal under account/settings in docupanda website. - [Register an Endpoint](https://docs.docupipe.ai/reference/generate_endpoint-1.md): Generate a webhook endpoint for your application. The specified url will receive ALL events. If you want to define a more granular specification, use our dashboard portal under account/settings in DocuPipe's website. - [Get Webhook Portal URL](https://docs.docupipe.ai/reference/get_portal_link-1.md): Generates a magic link for you to log on to URL to the app portal. From the portal you can configure webhook subscriptions in a user-friendly interface. - [Delete a Workflow](https://docs.docupipe.ai/reference/delete_workflow-1.md): Delete a workflow by providing the workflow ID. - [List your Workflows](https://docs.docupipe.ai/reference/list_workflows-1.md): This endpoint returns a list of all your workflows. - [Create a Workflow](https://docs.docupipe.ai/reference/post_workflow_on_submit_document-1.md): Use this endpoint to create a workflow that triggers when a document is submitted. The workflow can be configured to either: 1. Always run the specified schema(s) on the document, set via the `standardizeStep`. 2. Always run the specified schema(s) on the document, then follow up with a review step, set via the `standardizeReviewStep`. 3. Split the document and standardize each resulting sub-document, set via the `splitStandardizeStep`. 4. Conditionally run one or more schemas based on the document's `classId`, set via the `classifyStandardizeStep`. 5. Split the document, classify each resulting sub-document, and standardize based on class-to-schema mappings, set via the `splitClassifyStandardizeStep`. Note: You must provide exactly one of these inputs. To run the workflow, use the `POST /document` endpoint with the `workflowId` that gets returned from this endpoint. - [Update a Workflow](https://docs.docupipe.ai/reference/update_workflow.md): Update an existing workflow by posting the same parameters as `POST /workflow/on-submit-document`. The workflow will retain its original `workflowId`. - [Upload and Standardize Multiple](https://docs.docupipe.ai/reference/upload-and-standardize-multiple.md): Upload multiple documents to DocuPipe and standardize them, then retrieve the results - [Workflow: Upload and Standardize](https://docs.docupipe.ai/reference/upload-and-standardize-using-workflow.md): Upload and standardize a document in a single POST request using workflows. - [Upload Multiple Documents](https://docs.docupipe.ai/reference/upload-multiple.md): Upload multiple documents to DocuPipe and retrieve the results - [Upload](https://docs.docupipe.ai/reference/upload.md): Upload a document to DocuPipe and retrieve OCR parsing of results - [Workflow: Upload, Classify and Standardize](https://docs.docupipe.ai/reference/workflow-upload-classify-and-standardize.md): Upload and classify a document, and then standardize for certain classes, all in a single POST request using workflows. - [API Rate Limits](https://docs.docupipe.ai/reference/docupipe-rate-limits.md) - [Getting Started With DocuPipe API](https://docs.docupipe.ai/reference/getting-started-with-docupipe.md): Take your first steps by uploading a document and getting its parsed text and tables. - [Using LLMs With These Docs](https://docs.docupipe.ai/reference/using-llms-with-these-docs.md): Learn how to copy and paste into chatGPT like a boss - [Webhooks](https://docs.docupipe.ai/reference/webhooks.md): Architect event-driven workflows