Observability — Prometheus + Grafana
PrivateRouter ships with an opt-in observability stack: Prometheus scrapes
the API's /metrics endpoint, Grafana displays a pre-provisioned
dashboard. Both run as a docker-compose profile so they don't burn resources
in normal dev.
Start the stack
docker compose --profile observability up -d
This adds two containers alongside the existing api, web, postgres,
redis, and litellm services:
| Service | URL | Default credentials |
|---|---|---|
| Prometheus | http://localhost:9090 | none (read-only) |
| Grafana | http://localhost:3001 | admin / admin |
Change the Grafana admin password by setting GRAFANA_ADMIN_PASSWORD in
.env before the first start. After the first login Grafana will prompt
you to change it.
Stop the stack
docker compose --profile observability stop prometheus grafana
# or to remove containers (volumes survive):
docker compose --profile observability rm -sf prometheus grafana
The base stack (api/web/postgres/redis/litellm) is unaffected.
What gets collected
The API emits the following custom Prometheus metrics (all prefixed pr_):
HTTP request metrics (every API request)
pr_http_requests_total{method,route,status}— counterpr_http_request_duration_seconds{method,route,status}— histogrampr_http_requests_in_flight{method,route}— gaugepr_http_exceptions_total{method,route,exception_type}— counter
route is the matched FastAPI route template (e.g. /api/keys/{id}),
not the raw path. This keeps label cardinality bounded.
LLM / completion metrics
pr_llm_calls_total{model,endpoint,status}— counterpr_llm_tokens_total{model,direction}— counter (directionisinputoroutput)pr_llm_cost_usd_total{model}— counter (running sum of usage row costs)
Domain counters
pr_chat_conversations_created_totalpr_credit_transactions_total{kind}—topup/usage/admin_adjust/refund
GPU node health
pr_gpu_node_up{node_id,name}— gauge, 1 if healthypr_gpu_node_utilization_pct{node_id,name}— gauge, 0-100
User identity (user_id) is never a Prometheus label — high
cardinality would blow up the time series database. Per-user correlation
lives in the structured JSON access log instead (request_id + user_id).
Structured access logs
Every HTTP request emits one JSON line to stdout via the pr.access logger:
{
"ts": 1778985270.62,
"request_id": "c51fcef36f24437dba83982590ba5763",
"method": "GET",
"route": "/api/billing/balance",
"raw_path": "/api/billing/balance",
"status": 401,
"latency_ms": 0.89,
"user_id": null,
"ip": "172.24.0.6",
"user_agent": "Mozilla/5.0...",
"bytes": 30
}
The same request_id is echoed back to the client as X-Request-ID. To
correlate a customer report to a log line:
docker logs privaterouter-api 2>&1 | grep '"request_id": "c51fcef3..."'
Clients can also send their own X-Request-ID header (e.g. from upstream
tracing); the server will use it verbatim.
The Grafana dashboard
The pre-provisioned PrivateRouter — Overview dashboard
(infra/observability/grafana/dashboards/privaterouter-overview.json)
has 12 panels organised as:
- Top-row stat tiles: rps, error rate, p95 latency, in-flight
- Latency percentiles (p50/p95/p99) over time
- Requests/sec split by status class (2xx / 4xx / 5xx)
- Top 10 routes by request volume + top 10 by p95 latency
- LLM tokens/sec by model and direction
- LLM spend (USD/hour) by model
- GPU node up/down and utilization %
To add panels, edit in Grafana — allowUiUpdates: true is set so changes
persist in the Grafana volume. Export and commit the JSON back to
infra/observability/grafana/dashboards/ when ready to share.
Production hardening (when you put this on the internet)
The defaults are sane for development. Before exposing publicly:
- Bind
/metricsto the internal network only. Currently the API serves it on 8000 alongside the public API. Put nginx in front and block/metricsfrom the outside, OR run a separate uvicorn worker on a different port for metrics. - Rotate
GRAFANA_ADMIN_PASSWORDfrom the defaultadmin. - Enable HTTPS on the Prometheus + Grafana ingress.
- Increase TSDB retention from the default 15d (
--storage.tsdb.retention.time=...). - Set up alerting rules in Prometheus (
infra/observability/prometheus.yml) or Grafana for high error rates, latency spikes, and GPU node down.
LiteLLM metrics
The Prometheus config includes a litellm scrape job pointed at
http://litellm:4000/metrics. LiteLLM's own metrics endpoint requires
the master key as a bearer token; the scrape job is left as a
down/unknown target until that's configured (next step). See the
LiteLLM Prometheus docs
to enable.