Handle throttling and plan around request limits

Every request to the mintfax API counts against a per-workspace rate limit enforced at the API Gateway. The limit structure, response headers, 429 response shape, and client backoff strategies are all covered here.

How limits are applied

Rate limits are scoped to the API key that authenticates the request. Each API key belongs to a workspace, so the limit is effectively per-workspace. Sandbox and live workspaces have independent limits. The API Gateway enforces a sliding window measured in requests per minute. Every response includes headers showing where you stand within that window. Your client can read these and pace itself before hitting the ceiling.

Response headers

Three rate-limit headers appear on every API response, regardless of whether the request succeeded.

Header	Type	Description
`X-RateLimit-Limit`	integer	Maximum requests allowed per minute for this key
`X-RateLimit-Remaining`	integer	Requests remaining in the current one-minute window
`X-RateLimit-Reset`	integer	Seconds until the current window resets

When the API rejects a request with HTTP 429, the response also includes a Retry-After header. Its value is the number of seconds to wait before sending the next request.

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 34
Retry-After: 34

Edge cases

X-RateLimit-Remaining can reach 0 before you see a 429 if multiple requests are in flight simultaneously.
X-RateLimit-Reset counts down from the start of the window. Whether the request succeeded or failed does not affect it.

429 response body

A rate-limited request returns the standard error envelope with the rate_limit_exceeded code.

{
  "error": "rate_limit_exceeded",
  "message": "Too many requests. Please slow down and try again.",
  "action": "wait_and_retry",
  "docs": "https://mintfax.com/docs/errors/rate-limit-exceeded"
}

The action field is wait_and_retry. Read the Retry-After header and pause for that many seconds before retrying. Edge cases

Retry-After is always in whole seconds, never a date.
Retrying before Retry-After elapses produces another 429 with a fresh Retry-After value.

Sandbox and live limits

Sandbox and live workspaces enforce separate rate limits. Each workspace’s API keys count against that workspace’s own quota. Hitting the limit on a sandbox key does not affect your live workspace, and the reverse is also true.

Designing clients that respect rate limits

Read the headers before you need them

Check X-RateLimit-Remaining on every response. When it drops below a threshold you consider safe (10% of X-RateLimit-Limit, for example), slow down proactively instead of waiting for a 429.

Use exponential backoff on 429

When you receive a 429, wait for the number of seconds in Retry-After, then retry. If the retry also comes back 429, double the wait time. Cap the backoff at 60 seconds.

import time
import requests

def send_with_backoff(url, headers, data, max_retries=5):
    for attempt in range(max_retries):
        response = requests.post(url, headers=headers, data=data)

        if response.status_code != 429:
            return response

        retry_after = int(response.headers.get("Retry-After", 1))
        wait = retry_after * (2 ** attempt)
        wait = min(wait, 60)
        time.sleep(wait)

    return response

Combine with idempotency keys

When retrying a POST /fax request after a 429, use the same idempotency key on the retry. If the original request was actually accepted before the 429 reached your client, the retry returns the original response instead of creating a duplicate fax.

Avoid polling loops

Do not poll GET /fax/{id} in a tight loop to check delivery status. Use webhooks instead. Webhooks push status updates to your server as they happen, so you do not burn rate-limit quota tracking fax progress. If you must poll, space requests at least 5 seconds apart and stop once the fax reaches a terminal state (delivered or failed).

Agent and LLM callers

If an LLM or agent is calling the API, pass the X-RateLimit-Remaining and Retry-After values back in the context the agent receives. This lets the agent adjust pacing without guessing. Hard-code the backoff logic in your integration layer so the agent cannot override it and flood requests.

Last updated: 2026-05-09

Get Started

Environments

Billing

Webhooks

Errors

Handle throttling and plan around request limits

How limits are applied

Response headers

429 response body

Sandbox and live limits

Designing clients that respect rate limits

Read the headers before you need them

Use exponential backoff on 429

Combine with idempotency keys

Avoid polling loops

Agent and LLM callers

Get Started

Environments

Billing

Webhooks

Errors

Documentation Index

​How limits are applied

​Response headers

​429 response body

​Sandbox and live limits

​Designing clients that respect rate limits

​Read the headers before you need them

​Use exponential backoff on 429

​Combine with idempotency keys

​Avoid polling loops

​Agent and LLM callers

How limits are applied

Response headers

429 response body

Sandbox and live limits

Designing clients that respect rate limits

Read the headers before you need them

Use exponential backoff on 429

Combine with idempotency keys

Avoid polling loops

Agent and LLM callers