Auto-routing

PrivateRouter ships a special model name, privaterouter/auto, that lets you delegate model selection to the platform. When you call it, the router picks the right underlying model for you and — if your preferred model times out — automatically falls back through your configured chain.

Auto-routing is disabled by default. It's an opt-in feature: until you turn it on, privaterouter/auto simply routes to the cheapest healthy model on your plan.

Included on every plan. There's no surcharge for using privaterouter/auto — you pay for the model that actually served the request, billed at that model's normal per-token rate.

Quickstart

Drop-in replacement for any OpenAI-compatible client:

curl https://api.privaterouter.com/v1/chat/completions \
  -H "Authorization: Bearer $PRIVATEROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "privaterouter/auto",
    "messages": [{"role": "user", "content": "Summarize the last commit message."}]
  }'

The response's model field reflects the model that actually served your request, not the sentinel — useful for billing reconciliation and debugging.

Configuring via the UI

The simplest way: open the Routing page in your dashboard (/routing). There you can:

Toggle Enable auto-routing to opt in.
Pick a Preferred model (or leave on Cheapest healthy).
Build your Fallback chain — up to 10 models, tried in order on timeout.
Tune Per-attempt timeout (1 – 120s) and Max attempts (1 – 10) under Advanced.

A small cyan badge at the top of the page shows AUTO-ROUTING ACTIVE when enabled.

Configuring via the API

The same shape is exposed at PUT /api/routing/policy (session auth, not API-key auth):

curl -X PUT https://api.privaterouter.com/api/routing/policy \
  -H "Cookie: pr_session=$SESSION_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "enabled": true,
    "preferred_model_public_name": "privaterouter/qwen-pro",
    "fallback_chain_public_names": [
      "privaterouter/qwen-fast",
      "privaterouter/fast"
    ],
    "timeout_ms": 20000,
    "max_attempts": 3
  }'

GET /api/routing/policy returns the current shape (creating a disabled default the first time).

Failover semantics

PrivateRouter's auto-router has one trigger: a per-attempt timeout. When the upstream model doesn't return its first response chunk before timeout_ms, we abort that attempt and try the next model in your chain. We stop walking the chain when either:

a model returns a response (success or otherwise), or
we've exhausted your max_attempts, or
we've run out of healthy models in the chain.

What does not trigger failover:

HTTP 5xx errors from upstream — these usually mean the model returned something, just not what you wanted. Failing over could double-bill you.
HTTP 429 (rate-limited) — same reasoning.
HTTP 4xx (bad request) — your payload is the problem; retrying with a different model won't help.

If you want failover on those conditions, file a feature request — for now we're keeping the trigger surface deliberately minimal.

Per-request override

If you want a specific model for one request, just call it by name. privaterouter/qwen-pro always means "use qwen-pro, single shot, no failover", regardless of whether auto-routing is enabled. Only the exact string privaterouter/auto triggers the chain.

This is useful when you want deterministic behaviour for production traffic but still want privaterouter/auto available for development.

Limits

Field	Range	Default
`timeout_ms`	1000 – 120000	30000
`max_attempts`	1 – 10	3
Fallback chain length	0 – 10	0

Validation rules:

The sentinel privaterouter/auto may not appear in the fallback chain or as the preferred model (it's the trigger, not a target).
A model can't appear more than once in your chain.
Your preferred model must not also be in your fallback chain.
All referenced models must be on your current plan.

Billing

The cost line on your invoice reflects the model that served each request, not the sentinel. If your preferred model timed out and the second model served, you pay for the second model. If all attempts timed out, you pay nothing for that request (we return 504).

This is intentional: it keeps the cost transparent and makes auto-routing safe to leave on across your whole fleet.

Composing with plugins

The Pareto Router plugin layers a task-aware tier hint on top of privaterouter/auto. Pass routing_tier (fast / code / quality) in your request body, or set a default on the Plugins page, and the router prepends a family-preferred model to your existing fallback chain — so tier hints and the timeout-fallback chain compose cleanly.