Per-key analytics + spend alerts
PrivateRouter gives you Stripe-grade observability for every API key: per-key usage, latency, cost, and alerts when a key approaches its spend cap.
What's tracked
For each API key, PrivateRouter records every request as a usage_event
row with token counts, cost, latency, status code, and the served model
name. The drill-down page at /keys/{id} aggregates these into:
- Total requests, error count + rate
- Total cost month-to-date (and any window 1-90 days)
- p50 / p95 latency
- Top 5 models by request count + cost
- Daily breakdown chart (requests + spend over time)
Privacy note: usage_events never store prompt or response content — only token counts + cost + latency. The chat-message content is encrypted at rest separately (see /docs/observability).
Spend caps
Two caps per key:
- Monthly cap —
monthly_limit_usd. Used to compute alert thresholds (% of monthly spend). Setting this enables alerts. - Daily cap —
daily_limit_usd. Hard pre-flight reject with 402daily_cap_exceededwhen today's spend would exceed it. Resets at 00:00 UTC.
Both are nullable. Set via UI on /keys/{id} or PATCH /api/keys/{id}:
curl -X PATCH https://api.privaterouter.com/api/keys/{id} \
-H "Authorization: Bearer $SESSION_TOKEN" \
-H "Content-Type: application/json" \
-d '{"monthly_limit_usd": 50, "daily_limit_usd": 5}'
Pass null to clear.
Alert subscriptions
Two delivery channels: email and webhook.
Email alerts
curl -X POST https://api.privaterouter.com/api/keys/{id}/alerts \
-H "Authorization: Bearer $SESSION_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"kind": "email",
"destination": "alerts@yourcompany.com",
"thresholds_pct": [50, 75, 90, 100]
}'
Emails are sent via SMTP (operator-configured) with a branded dark-theme
template + plaintext fallback. Subject: [PrivateRouter] {key_name} hit {N}% of monthly spend.
Webhook alerts
curl -X POST https://api.privaterouter.com/api/keys/{id}/alerts \
-H "Authorization: Bearer $SESSION_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"kind": "webhook",
"destination": "https://your-pager.example.com/pr-spend",
"thresholds_pct": [50, 75, 90, 100]
}'
Webhook payload shape:
{
"type": "spend.threshold",
"key_id": "uuid",
"key_prefix": "sk-pr-AbC1...xyz9",
"threshold_pct": 75,
"billing_month": "2026-05",
"mtd_spend_usd": "37.51",
"monthly_limit_usd": "50.00",
"fired_at": "2026-05-17T18:30:00Z"
}
Headers on every webhook POST:
Content-Type: application/jsonUser-Agent: PrivateRouter-Webhook/1.0X-PrivateRouter-Event: spend.thresholdX-PrivateRouter-Signature: sha256=<hex>— HMAC-SHA256 of the JSON body. Verify on your end with the webhook secret (set per environment by the PrivateRouter operator).
Retry policy:
- 2xx → success, no retry
- 4xx → caller error; we don't retry (don't hammer a misconfigured URL)
- 5xx or connection error → 3 attempts total with exponential backoff (0.5s, 1.5s, 4.5s)
- Timeout per attempt: 5s
Threshold semantics
Thresholds are percentages of the monthly cap. A subscription with
[50, 75, 90, 100] fires once each as MTD spend crosses each level.
Once a threshold fires for a given calendar month, it won't fire again
until the next month — dedupe is per (key_id, YYYY-MM, threshold_pct).
You can configure any subset of these thresholds:
[25, 50, 75, 90, 100] (max 5 entries, integers 1-100).
Alert event audit log
Every fired alert lives in key_alert_events and is visible at:
GET /api/keys/{id}/alert-events?limit=50
Shape:
[
{
"id": "uuid",
"threshold_pct": 75,
"billing_month": "2026-05",
"fired_at": "2026-05-17T18:30:00Z",
"delivery_status": "sent",
"response_code": 200,
"error_message": null
}
]
delivery_status values:
sent— successfully deliveredfailed— delivery attempted and failed (seeerror_message)degraded— SMTP not configured on this PrivateRouter instance, alert recorded but not emailedpending— in-flight (rare; visible during a brief window)
Reading analytics
GET /api/keys/{id}/analytics?window_days=7
{
"window_days": 7,
"total_requests": 1234,
"error_count": 12,
"error_rate": 0.0097,
"p50_latency_ms": 340,
"p95_latency_ms": 1280,
"total_cost_usd": "2.4731",
"total_tokens_in": 45000,
"total_tokens_out": 38000,
"top_models": [
{"model_public_name": "privaterouter/qwen-pro", "requests": 800, "cost_usd": "1.9000"},
{"model_public_name": "privaterouter/fast", "requests": 434, "cost_usd": "0.5731"}
],
"daily_breakdown": [
{"date": "2026-05-11", "requests": 180, "errors": 1, "cost_usd": "0.34"},
...
]
}
window_days accepts 1-90. Daily breakdown is back-filled with zero
entries for days with no traffic, so the array length always equals
window_days.
When alerts fire
Alerts are evaluated as a background task after every successful chat-completion or embeddings request — there's no polling delay. Practically: an alert email arrives within a few seconds of the threshold crossing, regardless of which model served the request.
When alerts don't fire
- Key has no
monthly_limit_usdset — there's nothing to compare against; the alerts are silent. - Subscription is
active: false. - Threshold already fired this calendar month.
The audit log (/api/keys/{id}/alert-events) is the ground truth for
"did my alert fire and what happened?" — start there if you're
troubleshooting.