DocuPipe API Rate Limits

To maintain a fair and stable environment for all developers, the DocuPipe REST API enforces rate limits. These limits govern the volume and pace of your requests, ensuring consistent platform performance for everyone. Rate limits are enforced independently of your billing credits (the units you purchase or receive under your plan).

While billing credits are tied to cost and monthly usage allowances, rate limit tokens are a free. They measure the velocity at which you access our API, and they are in place to protect against abuse and excessive traffic.

You may use up all your monthly billing credits without once hitting a rate limit, or you might hit rate limits even if you have ample billing credits. Understanding and respecting these two separate concepts will help you manage both your costs and your application’s performance.

🚧
Notice that rate limit tokens in this document refers to API usage. Do not conflate this with LLM tokens, which are an entirely different and unrelated.

Rule of Thumb

If you don't want to dive into the technical details, here's a simple guideline for staying within the default rate limits:

Submit documents at ~1 POST per second (1 document per second).
Poll for results using GET with exponential backoff — start with a short interval (e.g., 1 second), then double it on each retry until results are ready.
Use webhooks whenever possible — webhooks notify you when processing completes, eliminating the need to poll entirely. This is our recommended approach.

Following this pattern, you'll comfortably stay within the default rate limits without needing to track tokens or bucket capacity. Read on if you want to understand the full details or optimize for higher throughput.

Understanding Our Rate Limits

Token-Based Request Costs

Every API request consumes rate limit tokens from your allocated bucket. This rate-based accounting system is separate from billing credits. Rate limit tokens never cost money; they only limit how rapidly you can make calls. The cost in tokens depends on the HTTP method used:

GET requests: Cost 1 token each
POST requests: Cost 10 tokens each
DELETE requests: Cost 10 tokens each

We don’t differentiate between API endpoints for rate limits. Only the HTTP method affects the number of tokens consumed.

Steady-State and Bursting

Subscribed non-enterprise users receive a continuous replenishment of rate limit tokens at 1,800 tokens per minute. This steady influx supports:

About 3 POST/DELETE requests per second (3 * 10 tokens = 30 tokens per second, or 1,800 tokens per minute).
About 30 GET requests per second (30 * 1 token = 30 tokens per second, again totaling 1,800 tokens per minute).

To handle short-term spikes, we employ a leaky bucket algorithm with a maximum capacity of 3 times your per-minute refill rate. At 1,800 tokens/min, this equates to a bucket of 5,400 tokens at full capacity. This reservoir lets you exceed the steady-state rate for short periods until these tokens are used up, after which you must wait while tokens refill at the normal rate.

The Leaky Bucket Algorithm

Think of your rate limit like a bucket of tokens:

Bucket Capacity: Holds up to 5,400 tokens.
Refill Rate: Adds 1,800 tokens per minute continuously.
Cost Per Request:
- GET requests consume 1 token.
- POST and DELETE requests consume 10 tokens.

If you have enough tokens, your request succeeds. If you repeatedly exceed the steady-state rate, you’ll eventually drain the bucket. Once empty, further requests fail with a rate limit error until tokens replenish.

Example Usage Scenario

Short burst: Need to issue 50 GET requests instantly? Even though this is more than the “per second” baseline, the bucket’s capacity allows it. Your burst is absorbed by the stored tokens.
Sustained load: Continually making more than the “steady-state” number of calls over time will drain your bucket. Once empty, you must slow down or pause until it refills.

Rate Limit Headers

We provide helpful headers in each response so you can track and manage your rate limit usage in real-time.

X-RateLimit-Limit: Maximum size of your token bucket (e.g., 6000 means you can accumulate up to 6,000 tokens if you pause requests for a while).
X-RateLimit-Remaining: Tokens still available after this request.
X-RateLimit-Reset: The UTC epoch timestamp when your bucket would be fully refilled if it were empty now. (In practice, tokens refill continuously, but this gives you a reference point.)
X-RateLimit-Used: How many tokens this request you just made has consumed (currently this will always be 1 for GET, 10 for POST/DELETE).

Example Response Headers

X-RateLimit-Limit: 6000
X-RateLimit-Remaining: 5990
X-RateLimit-Reset: 1700000000
X-RateLimit-Used: 10

Here, the bucket can hold 6,000 tokens. After this request, 5,990 remain. This particular request cost you 10 tokens. The X-RateLimit-Reset indicates when a fully drained bucket would be back at full capacity using unix timestamp

Handling Rate Limit Errors

If you deplete your bucket’s tokens, you’ll see a 429 Too Many Requests error, which can be identified via status code 429. At that point:

Do not retry immediately.
Let tokens replenish. Wait a random interval between 10 and 60 seconds before retrying so your clients don't stampede the API at the same moment (the classic thundering herd effect).
Understand the root cause. Excessive polling is the most common reason for rate limiting. Slow the polling frequency, add exponential backoff, and ensure loops eventually stop.
Use webhooks to avoid unnecessary GET requests. Webhooks eliminate the need to poll for document, classification, or standardization results.
Leverage Workflows. A workflow chains multiple steps (e.g. upload -> classify -> standardize) into a single API call, reducing POST volume and polling.
Reach out if you need higher limits. If you legitimately need more throughput, contact DocuPipe Support. Enterprise plans can scale to handle hundreds of millions of documents per day.

Getting More Capacity

If your application requires higher throughput or more generous bursting capacity, consider upgrading to an enterprise plan. Enterprise customers can receive adjusted rate limit configurations. For more information, please contact our team by pressing the chat icon hovering on the bottom right of this page, or submitting a request.

Summary

DocuPipe’s rate limits ensure that everyone shares a stable, responsive platform. By monitoring your token usage, adjusting your request strategy, and understanding the difference between billing credits and rate limit tokens, you can maintain smooth, uninterrupted access to the DocuPipe REST API—even under periods of high demand.