Auto-routing
PrivateRouter ships a special model name, privaterouter/auto, that
lets you delegate model selection to the platform. When you call it, the
router picks the right underlying model for you and — if your preferred
model times out — automatically falls back through your configured chain.
Auto-routing is disabled by default. It's an opt-in feature: until
you turn it on, privaterouter/auto simply routes to the cheapest
healthy model on your plan.
Included on every plan. There's no surcharge for using privaterouter/auto
— you pay for the model that actually served the request, billed at
that model's normal per-token rate.
Quickstart
Drop-in replacement for any OpenAI-compatible client:
curl https://api.privaterouter.com/v1/chat/completions \
-H "Authorization: Bearer $PRIVATEROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "privaterouter/auto",
"messages": [{"role": "user", "content": "Summarize the last commit message."}]
}'
The response's model field reflects the model that actually served
your request, not the sentinel — useful for billing reconciliation and
debugging.
Configuring via the UI
The simplest way: open the Routing page in your dashboard
(/routing). There you can:
- Toggle Enable auto-routing to opt in.
- Pick a Preferred model (or leave on Cheapest healthy).
- Build your Fallback chain — up to 10 models, tried in order on timeout.
- Tune Per-attempt timeout (1 – 120s) and Max attempts (1 – 10) under Advanced.
A small cyan badge at the top of the page shows AUTO-ROUTING ACTIVE
when enabled.
Configuring via the API
The same shape is exposed at PUT /api/routing/policy (session auth,
not API-key auth):
curl -X PUT https://api.privaterouter.com/api/routing/policy \
-H "Cookie: pr_session=$SESSION_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"enabled": true,
"preferred_model_public_name": "privaterouter/qwen-pro",
"fallback_chain_public_names": [
"privaterouter/qwen-fast",
"privaterouter/fast"
],
"timeout_ms": 20000,
"max_attempts": 3
}'
GET /api/routing/policy returns the current shape (creating a disabled
default the first time).
Failover semantics
PrivateRouter's auto-router has one trigger: a per-attempt
timeout. When the upstream model doesn't return its first response
chunk before timeout_ms, we abort that attempt and try the next model
in your chain. We stop walking the chain when either:
- a model returns a response (success or otherwise), or
- we've exhausted your
max_attempts, or - we've run out of healthy models in the chain.
What does not trigger failover:
- HTTP 5xx errors from upstream — these usually mean the model returned something, just not what you wanted. Failing over could double-bill you.
- HTTP 429 (rate-limited) — same reasoning.
- HTTP 4xx (bad request) — your payload is the problem; retrying with a different model won't help.
If you want failover on those conditions, file a feature request — for now we're keeping the trigger surface deliberately minimal.
Per-request override
If you want a specific model for one request, just call it by name.
privaterouter/qwen-pro always means "use qwen-pro, single shot, no
failover", regardless of whether auto-routing is enabled. Only the exact
string privaterouter/auto triggers the chain.
This is useful when you want deterministic behaviour for production
traffic but still want privaterouter/auto available for development.
Limits
| Field | Range | Default |
|---|---|---|
timeout_ms | 1000 – 120000 | 30000 |
max_attempts | 1 – 10 | 3 |
| Fallback chain length | 0 – 10 | 0 |
Validation rules:
- The sentinel
privaterouter/automay not appear in the fallback chain or as the preferred model (it's the trigger, not a target). - A model can't appear more than once in your chain.
- Your preferred model must not also be in your fallback chain.
- All referenced models must be on your current plan.
Billing
The cost line on your invoice reflects the model that served each request, not the sentinel. If your preferred model timed out and the second model served, you pay for the second model. If all attempts timed out, you pay nothing for that request (we return 504).
This is intentional: it keeps the cost transparent and makes auto-routing safe to leave on across your whole fleet.
Composing with plugins
The Pareto Router plugin layers a
task-aware tier hint on top of privaterouter/auto. Pass
routing_tier (fast / code / quality) in your request body, or
set a default on the Plugins page, and the router prepends a
family-preferred model to your existing fallback chain — so tier hints
and the timeout-fallback chain compose cleanly.