Merging Documents
Combine several documents into one, and how that differs from a Merged Excel download
Overview
Merging combines two or more documents into a single new document. The pages are joined in the order you choose, and the result is a brand-new document with its own parsed text. The original files stay exactly as they were.
Reach for it when one logical document arrived as several separate files and you want to process it as one unit. Examples:
- A lease or contract that was scanned in batches (pages 1-20, then 21-40, ...) and uploaded as separate files.
- A monthly statement or report whose pages came in as individual uploads, that you want to run through a single Schema.
- A set of single-page images (e.g. phone photos of receipts) that together make up one record.
Merging a document is not the same as a "Merged Excel" download. Merge (this feature) combines the source PDFs into one new document. Merged Excel is a download format that combines the extraction results of several Standardizations into one spreadsheet. If your goal is one spreadsheet with all your data in it, you probably want Merged Excel, not Merge - see Downloading Results.
How to merge
- Go to the Documents page and select two or more documents using the checkboxes.
- Click the Actions button and choose Merge. A three-step window opens: Select -> Configure -> Merge.
- Select - confirm the documents you want to combine. You can search and filter by dataset to pull in more.
- Configure - drag the rows (or use the up/down arrows) to set the page order, remove any you don't want (you need at least two), and give the merged document a filename. You can also assign it to a dataset.
- Merge - run it. The job processes in the background; watch the Jobs page for the green checkmark. The new merged document then shows up on the Documents page.
After merging: run Standardize
A merged document is a fresh document with no extraction on it yet. To pull data out of it, run a Standardization on the merged document with the right Schema, exactly as you would for any other document.
Merging combines documents, not extractions. The merged file is a brand-new document with no Standardization of its own - any extraction (or other work) you already ran stays attached to the original documents and does not carry over to the merge. To get data out of a merged document, run Standardize on it.
Good to know
- The original documents are left untouched - merge always creates a new, separate document.
- The merged document is parsed automatically, so it's ready to standardize, classify, or split right away.
- Pages are combined in the exact order shown in the Configure step.
- Merging is very cheap - 0.01 credits per page, rounded up. See Understanding Credits and Billing for how credits work.
