TokenHub enforces rate limits on every API key to ensure fair usage across all customers and to protect system stability. Limits are measured in requests per minute (RPM) and tokens per minute (TPM). When you exceed a limit, the API returns aDocumentation Index
Fetch the complete documentation index at: https://docs.inferoute.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
429 Too Many Requests response until the window resets.
Default limit tiers
| Plan | Requests per minute (RPM) | Tokens per minute (TPM) |
|---|---|---|
| Free | 60 | 100,000 |
| Pro | 600 | 1,000,000 |
| Enterprise | Custom | Custom |
Rate limit headers
Every API response includes headers that tell you your current limit status:| Header | Description |
|---|---|
X-RateLimit-Limit-Requests | Maximum number of requests allowed in the current window. |
X-RateLimit-Remaining-Requests | Number of requests remaining before you hit the limit. |
X-RateLimit-Reset-Requests | Unix timestamp (seconds) when the request window resets. |
Handling 429 errors
When you exceed your rate limit, the API responds with:Retry-After header tells you the number of seconds to wait before retrying. The safest strategy is exponential backoff with jitter — wait progressively longer between retries, with a small random component to avoid synchronized retry storms.
TokenHub rate limits are separate from any limits enforced by the underlying providers. Even if you are within your TokenHub limits, a provider may throttle the request on their end. TokenHub will surface those errors with the same
429 status code and attempt automatic retries according to your routing configuration.