Per-key analytics + spend alerts

PrivateRouter gives you Stripe-grade observability for every API key: per-key usage, latency, cost, and alerts when a key approaches its spend cap.

What's tracked

For each API key, PrivateRouter records every request as a usage_event row with token counts, cost, latency, status code, and the served model name. The drill-down page at /keys/{id} aggregates these into:

Total requests, error count + rate
Total cost month-to-date (and any window 1-90 days)
p50 / p95 latency
Top 5 models by request count + cost
Daily breakdown chart (requests + spend over time)

Privacy note: usage_events never store prompt or response content — only token counts + cost + latency. The chat-message content is encrypted at rest separately (see /docs/observability).

Spend caps

Two caps per key:

Monthly cap — monthly_limit_usd. Used to compute alert thresholds (% of monthly spend). Setting this enables alerts.
Daily cap — daily_limit_usd. Hard pre-flight reject with 402 daily_cap_exceeded when today's spend would exceed it. Resets at 00:00 UTC.

Both are nullable. Set via UI on /keys/{id} or PATCH /api/keys/{id}:

curl -X PATCH https://api.privaterouter.com/api/keys/{id} \
  -H "Authorization: Bearer $SESSION_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"monthly_limit_usd": 50, "daily_limit_usd": 5}'

Pass null to clear.

Alert subscriptions

Two delivery channels: email and webhook.

Email alerts

curl -X POST https://api.privaterouter.com/api/keys/{id}/alerts \
  -H "Authorization: Bearer $SESSION_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "kind": "email",
    "destination": "alerts@yourcompany.com",
    "thresholds_pct": [50, 75, 90, 100]
  }'

Emails are sent via SMTP (operator-configured) with a branded dark-theme template + plaintext fallback. Subject: [PrivateRouter] {key_name} hit {N}% of monthly spend.

Webhook alerts

curl -X POST https://api.privaterouter.com/api/keys/{id}/alerts \
  -H "Authorization: Bearer $SESSION_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "kind": "webhook",
    "destination": "https://your-pager.example.com/pr-spend",
    "thresholds_pct": [50, 75, 90, 100]
  }'

Webhook payload shape:

{
  "type": "spend.threshold",
  "key_id": "uuid",
  "key_prefix": "sk-pr-AbC1...xyz9",
  "threshold_pct": 75,
  "billing_month": "2026-05",
  "mtd_spend_usd": "37.51",
  "monthly_limit_usd": "50.00",
  "fired_at": "2026-05-17T18:30:00Z"
}

Headers on every webhook POST:

Content-Type: application/json
User-Agent: PrivateRouter-Webhook/1.0
X-PrivateRouter-Event: spend.threshold
X-PrivateRouter-Signature: sha256=<hex> — HMAC-SHA256 of the JSON body. Verify on your end with the webhook secret (set per environment by the PrivateRouter operator).

Retry policy:

2xx → success, no retry
4xx → caller error; we don't retry (don't hammer a misconfigured URL)
5xx or connection error → 3 attempts total with exponential backoff (0.5s, 1.5s, 4.5s)
Timeout per attempt: 5s

Threshold semantics

Thresholds are percentages of the monthly cap. A subscription with [50, 75, 90, 100] fires once each as MTD spend crosses each level. Once a threshold fires for a given calendar month, it won't fire again until the next month — dedupe is per (key_id, YYYY-MM, threshold_pct).

You can configure any subset of these thresholds: [25, 50, 75, 90, 100] (max 5 entries, integers 1-100).

Alert event audit log

Every fired alert lives in key_alert_events and is visible at:

GET /api/keys/{id}/alert-events?limit=50

Shape:

[
  {
    "id": "uuid",
    "threshold_pct": 75,
    "billing_month": "2026-05",
    "fired_at": "2026-05-17T18:30:00Z",
    "delivery_status": "sent",
    "response_code": 200,
    "error_message": null
  }
]

delivery_status values:

sent — successfully delivered
failed — delivery attempted and failed (see error_message)
degraded — SMTP not configured on this PrivateRouter instance, alert recorded but not emailed
pending — in-flight (rare; visible during a brief window)

Reading analytics

GET /api/keys/{id}/analytics?window_days=7

{
  "window_days": 7,
  "total_requests": 1234,
  "error_count": 12,
  "error_rate": 0.0097,
  "p50_latency_ms": 340,
  "p95_latency_ms": 1280,
  "total_cost_usd": "2.4731",
  "total_tokens_in": 45000,
  "total_tokens_out": 38000,
  "top_models": [
    {"model_public_name": "privaterouter/qwen-pro", "requests": 800, "cost_usd": "1.9000"},
    {"model_public_name": "privaterouter/fast", "requests": 434, "cost_usd": "0.5731"}
  ],
  "daily_breakdown": [
    {"date": "2026-05-11", "requests": 180, "errors": 1, "cost_usd": "0.34"},
    ...
  ]
}

window_days accepts 1-90. Daily breakdown is back-filled with zero entries for days with no traffic, so the array length always equals window_days.

When alerts fire

Alerts are evaluated as a background task after every successful chat-completion or embeddings request — there's no polling delay. Practically: an alert email arrives within a few seconds of the threshold crossing, regardless of which model served the request.

When alerts don't fire

Key has no monthly_limit_usd set — there's nothing to compare against; the alerts are silent.
Subscription is active: false.
Threshold already fired this calendar month.

The audit log (/api/keys/{id}/alert-events) is the ground truth for "did my alert fire and what happened?" — start there if you're troubleshooting.