How PrivateRouter models actually perform
Updated hourly from real production traffic. No synthetic benchmarks, no vendor self-reports.
Last refreshed
Why these numbers?Expand
Every request that flows through PrivateRouter writes a row into our usage log: model, latency, token counts, status, and cost. Once an hour, a rollup job aggregates those rows into per-model windowed buckets (7 days and 30 days). This page reads the latest rollup directly — there's no hand-curation between the proxy and the table above.
A model needs at least 100 requests in the selected window before it's ranked. Below that, the numbers are too noisy to publish — those models appear in the "Not enough data yet" section instead.
A request is counted as successful only if it returned 200 OK and the token stream completed cleanly (no mid-stream disconnect, no provider-side abort, no timeout). Anything else — 4xx, 5xx, dropped stream — counts as an error.
Models are ranked by a composite score that balances three things in plain English:
- Speed — lower p50 latency is better.
- Reliability — higher success rate is better.
- Price — lower blended $/1M tokens is better.
The three are normalized within the current window, then combined. No single metric dominates: a fast-but-flaky model won't beat a slightly-slower-but-reliable one, and a cheap model with terrible latency won't top the list either.
These numbers are aggregates across all traffic. No prompts, completions, user IDs, or API key fingerprints are exposed on this page — just counts, latencies, and dollar amounts rolled up per model.