Public model leaderboard

PrivateRouter publishes a real-time-ish leaderboard of how every hosted model is performing — latency, success rate, requests served, cost. No authentication required. This is the same data dashboard users see for their own keys, aggregated across the platform.

URL: /leaderboard API: GET /api/leaderboard (no auth)

Why this exists

Two reasons:

Transparency. PrivateRouter runs on self-hosted open models. We can't hide behind a brand promise — you get to verify the infrastructure works before signing up.
Comparison. Developers shopping for an AI provider need real numbers, not benchmarks-on-blog-posts. The leaderboard shows what actually happens in production.

How it's computed

Every API request becomes a usage_event row with token counts, latency, cost, and HTTP status. Once an hour (at HH:05 UTC), a background job aggregates the prior hour's events into model_perf_hourly — one row per (model, hour). The leaderboard endpoint sums these rows over a window (7 or 30 days), recomputes percentile latencies, and produces a composite ranking score.

Sample size threshold: models with fewer than 100 requests in the window are listed separately as "Not enough data yet" rather than ranked alongside high-traffic models. This avoids ranking new models on noisy data.

Ranking formula

ranking_score =
    0.40 * normalized_success_rate
  + 0.30 * (1 - normalized_p95_latency)
  + 0.30 * (1 - normalized_output_cost)

Each component is normalized to [0, 1] across the cohort, then combined. Higher is better. This deliberately weights reliability above latency above cost — a model that 502s is useless no matter how cheap.

You can override the sort order with ?sort_by=p95_latency, ?sort_by=cost, or ?sort_by=throughput.

API

curl "https://api.privaterouter.com/api/leaderboard?window_days=7&sort_by=ranking_score"

Response shape:

{
  "window_days": 7,
  "total_entries": 4,
  "generated_at": "2026-05-17T19:30:00Z",
  "entries": [
    {
      "model_public_name": "privaterouter/qwen-pro",
      "family": "qwen",
      "context_window": 32768,
      "is_featured": true,
      "requests": 12480,
      "success_rate": 0.994,
      "error_rate": 0.006,
      "p50_latency_ms": 312,
      "p95_latency_ms": 980,
      "total_tokens_in": 4500000,
      "total_tokens_out": 3200000,
      "total_cost_usd": "9.42",
      "input_price_per_million_usd": "0.500000",
      "output_price_per_million_usd": "1.500000",
      "ranking_score": 0.873
    }
  ],
  "insufficient_data_models": ["privaterouter/deepseek-reason"]
}

Per-model detail: GET /api/leaderboard/{model_public_name} (note: model_public_name contains a /, e.g. /api/leaderboard/privaterouter/qwen-pro).

Returns the same entry data plus a 168-hour breakdown for charting.

Caching

The endpoint is cached server-side in Redis for 5 minutes (Cache-Control: public, max-age=300). Subsequent fetches return the same payload until the TTL elapses or the next hourly rollup runs.

If you need sub-minute freshness, the per-user dashboard (/keys/{id}/analytics) reads usage_events directly — but for public display this 5-min cadence is intentional.

SEO

Each model gets a dedicated detail page at /leaderboard/{slug} with:

<meta> tags including key stats in the description
JSON-LD Product schema for search-engine rendering
Open Graph + Twitter card metadata

The sitemap (/sitemap.xml) lists every model + the main marketing pages. robots.txt allows public crawling while blocking /dashboard, /admin, and other authenticated routes.

Defining "success"

A request counts as success when:

HTTP status code = 200
Response stream completed (token usage was recorded)
The model returned at least one token

A request counts as error when:

HTTP 4xx/5xx
Stream aborted before any tokens
Timeout (the M11 router timeout fallback was triggered)