Public model leaderboard
PrivateRouter publishes a real-time-ish leaderboard of how every hosted model is performing — latency, success rate, requests served, cost. No authentication required. This is the same data dashboard users see for their own keys, aggregated across the platform.
URL: /leaderboard
API: GET /api/leaderboard (no auth)
Why this exists
Two reasons:
- Transparency. PrivateRouter runs on self-hosted open models. We can't hide behind a brand promise — you get to verify the infrastructure works before signing up.
- Comparison. Developers shopping for an AI provider need real numbers, not benchmarks-on-blog-posts. The leaderboard shows what actually happens in production.
How it's computed
Every API request becomes a usage_event row with token counts,
latency, cost, and HTTP status. Once an hour (at HH:05 UTC), a
background job aggregates the prior hour's events into
model_perf_hourly — one row per (model, hour). The leaderboard
endpoint sums these rows over a window (7 or 30 days), recomputes
percentile latencies, and produces a composite ranking score.
Sample size threshold: models with fewer than 100 requests in the window are listed separately as "Not enough data yet" rather than ranked alongside high-traffic models. This avoids ranking new models on noisy data.
Ranking formula
ranking_score =
0.40 * normalized_success_rate
+ 0.30 * (1 - normalized_p95_latency)
+ 0.30 * (1 - normalized_output_cost)
Each component is normalized to [0, 1] across the cohort, then
combined. Higher is better. This deliberately weights reliability
above latency above cost — a model that 502s is useless no matter
how cheap.
You can override the sort order with ?sort_by=p95_latency,
?sort_by=cost, or ?sort_by=throughput.
API
curl "https://api.privaterouter.com/api/leaderboard?window_days=7&sort_by=ranking_score"
Response shape:
{
"window_days": 7,
"total_entries": 4,
"generated_at": "2026-05-17T19:30:00Z",
"entries": [
{
"model_public_name": "privaterouter/qwen-pro",
"family": "qwen",
"context_window": 32768,
"is_featured": true,
"requests": 12480,
"success_rate": 0.994,
"error_rate": 0.006,
"p50_latency_ms": 312,
"p95_latency_ms": 980,
"total_tokens_in": 4500000,
"total_tokens_out": 3200000,
"total_cost_usd": "9.42",
"input_price_per_million_usd": "0.500000",
"output_price_per_million_usd": "1.500000",
"ranking_score": 0.873
}
],
"insufficient_data_models": ["privaterouter/deepseek-reason"]
}
Per-model detail: GET /api/leaderboard/{model_public_name}
(note: model_public_name contains a /, e.g.
/api/leaderboard/privaterouter/qwen-pro).
Returns the same entry data plus a 168-hour breakdown for charting.
Caching
The endpoint is cached server-side in Redis for 5 minutes
(Cache-Control: public, max-age=300). Subsequent fetches return the
same payload until the TTL elapses or the next hourly rollup runs.
If you need sub-minute freshness, the per-user dashboard
(/keys/{id}/analytics) reads usage_events directly — but for public
display this 5-min cadence is intentional.
SEO
Each model gets a dedicated detail page at /leaderboard/{slug} with:
<meta>tags including key stats in the description- JSON-LD
Productschema for search-engine rendering - Open Graph + Twitter card metadata
The sitemap (/sitemap.xml) lists every model + the main marketing
pages. robots.txt allows public crawling while blocking
/dashboard, /admin, and other authenticated routes.
Defining "success"
A request counts as success when:
- HTTP status code = 200
- Response stream completed (token usage was recorded)
- The model returned at least one token
A request counts as error when:
- HTTP 4xx/5xx
- Stream aborted before any tokens
- Timeout (the M11 router timeout fallback was triggered)