PrivateRouter API Quickstart (M2 dev)
Zero to first OpenAI-compatible request in about five minutes, using per-user API keys and the /v1/* gateway that lands in Milestone 2.
In M2 you never talk to LiteLLM directly. You talk to FastAPI at http://localhost:8000/v1, authenticated by an sk-pr-... API key minted from the dashboard or the /api/keys endpoint. FastAPI checks your balance, forwards the request to LiteLLM using a per-user virtual key, streams the response back, and records a usage event + deducts credits in the background.
Step 1 — Start the stack
git clone <repo-url> privaterouter
cd privaterouter
cp .env.example .env
# Edit .env and set strong values for:
# APP_SECRET_KEY, JWT_SECRET, LITELLM_MASTER_KEY, LITELLM_SALT_KEY,
# POSTGRES_PASSWORD
docker compose up -d
On first boot the api container automatically runs alembic upgrade head and then python -m scripts.seed, which upserts the 5 plans and 6 models. Subsequent restarts converge idempotently.
Sanity check:
docker compose ps
curl -fsS http://localhost:8000/health
curl -fsS http://127.0.0.1:4010/health/readiness
Both should return {"status": "ok"} (or healthy).
Step 2 — Sign up
Option A: dashboard (preferred)
Open http://localhost:3000/signup in a browser, fill in email + password, and submit. The dashboard sets an httpOnly cookie and drops you on /dashboard. An Account row is created automatically and you're put on the free plan ($1.00 of starter credits).
Option B: curl
curl -sS -X POST http://localhost:8000/api/auth/signup \
-H 'Content-Type: application/json' \
-d '{
"email": "alice@example.com",
"password": "correct-horse-battery-staple",
"name": "Alice"
}'
Response shape:
{
"access_token": "eyJhbG...VCJ9...",
"token_type": "bearer",
"user": {
"id": "8f2b1c4e-1a3d-4f5e-9b6a-2c0d4e7f8a91",
"email": "alice@example.com",
"name": "Alice",
"role": "user",
"status": "active",
"created_at": "2026-05-15T06:23:00.123456Z"
}
}
Signup also sets the same pr_session httpOnly cookie that /login does, so you can pass it as either a Bearer header or by reusing the cookie jar.
Capture the JWT for the rest of this guide:
TOKEN=$(curl -sS -X POST http://localhost:8000/api/auth/login \
-H 'Content-Type: application/json' \
-d '{"email":"alice@example.com","password":"correct-horse-battery-staple"}' \
| jq -r .access_token)
Verify it:
curl -sS http://localhost:8000/api/auth/me -H "Authorization: Bearer $TOKEN"
Step 3 — Top up credits
The free plan ships with $1.00 of credits, enough for ~4M tokens against privaterouter/fast. To play with bigger models, add more.
Via the dashboard
Visit http://localhost:3000/billing, click $10, and your balance updates instantly.
Via curl
curl -sS -X POST http://localhost:8000/api/billing/topup \
-H "Authorization: Bearer $TOKEN" \
-H 'Content-Type: application/json' \
-d '{"amount_usd": 10}'
Allowed amounts: 10, 25, 50, 100, 500 (USD). Response:
{
"balance_usd": "11.00",
"plan_name": "free",
"monthly_quota_tokens": null
}
Mock topup. No card is charged in M2. Real Stripe Checkout lands in M5.
Step 4 — Create an API key
Via the dashboard
- Go to http://localhost:3000/keys
- Click Create new key, give it a name (e.g.
laptop) - The
sk-pr-...value is shown once — copy it now. The dashboard only stores the prefix and a hash; we cannot show it again.
Via curl
curl -sS -X POST http://localhost:8000/api/keys \
-H "Authorization: Bearer $TOKEN" \
-H 'Content-Type: application/json' \
-d '{"name": "laptop"}'
Response:
{
"key": "sk-pr-live-7Q2f...g8vM",
"api_key": {
"id": "1c9e9a4f-...",
"name": "laptop",
"key_prefix": "sk-pr-live-7Q2f",
"status": "active",
"monthly_limit_usd": null,
"last_used_at": null,
"created_at": "2026-05-15T06:24:11.000000Z"
}
}
The key field is the only place the full secret is returned. If you lose it, delete and re-create.
export PR_KEY="sk-pr-live-7Q2f...g8vM"
Step 5 — Your first inference request
Every example below points at http://localhost:8000/v1 and authenticates with sk-pr-.... The path is OpenAI-compatible — same shape as api.openai.com/v1/chat/completions.
curl
curl -sS http://localhost:8000/v1/chat/completions \
-H "Authorization: Bearer $PR_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "privaterouter/fast",
"messages": [{"role": "user", "content": "Say hi in five words."}]
}'
Python (openai SDK)
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="sk-pr-live-7Q2f...g8vM",
)
resp = client.chat.completions.create(
model="privaterouter/fast",
messages=[{"role": "user", "content": "Say hi in five words."}],
)
print(resp.choices[0].message.content)
Node (openai SDK)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:8000/v1",
apiKey: "sk-pr-live-7Q2f...g8vM",
});
const resp = await client.chat.completions.create({
model: "privaterouter/fast",
messages: [{ role: "user", content: "Say hi in five words." }],
});
console.log(resp.choices[0].message.content);
You'll get a standard OpenAI chat.completion envelope. A UsageEvent row is written in the background and your balance is debited within a few hundred ms.
Step 6 — Streaming
Add "stream": true and consume the SSE event stream. Use -N with curl so it doesn't buffer:
curl -N -sS http://localhost:8000/v1/chat/completions \
-H "Authorization: Bearer $PR_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "privaterouter/qwen-fast",
"stream": true,
"messages": [{"role": "user", "content": "Write one haiku about TCP."}]
}'
Output is the standard OpenAI SSE format:
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hand"},"index":0}], ...}
data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"shake"},"index":0}], ...}
...
data: [DONE]
Python:
stream = client.chat.completions.create(
model="privaterouter/qwen-fast",
messages=[{"role": "user", "content": "Stream a haiku."}],
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="", flush=True)
Usage tokens are captured from the final chunk's usage block (LiteLLM injects stream_options.include_usage=true for accurate billing).
Step 7 — Inspect your usage
Every /v1/* call writes a usage_event row in the background.
Via the dashboard
http://localhost:3000/usage shows a paginated table with timestamps, model, token counts, latency, and per-request cost.
Via curl
curl -sS "http://localhost:8000/api/usage?limit=20" \
-H "Authorization: Bearer $TOKEN"
Response:
{
"items": [
{
"id": "...",
"model_public_name": "privaterouter/fast",
"input_tokens": 14,
"output_tokens": 9,
"cost_usd": "0.0000041",
"latency_ms": 412,
"status": "success",
"created_at": "2026-05-15T06:25:33Z"
}
],
"total": 1,
"limit": 20,
"offset": 0
}
Balance updates land in /api/billing/balance.
Step 8 — Embeddings
curl -sS http://localhost:8000/v1/embeddings \
-H "Authorization: Bearer $PR_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "privaterouter/embed",
"input": "Vectors are nice"
}'
Returns an OpenAI-shaped envelope with a single 768-dim vector (nomic-embed-text). The embed model is on the free plan's allow-list, so it works out of the box.
Python:
emb = client.embeddings.create(
model="privaterouter/embed",
input="Vectors are nice",
)
print(len(emb.data[0].embedding)) # 768
Available models
| Public name | Upstream model | Family | Context | Input $/M | Output $/M |
|---|---|---|---|---|---|
privaterouter/fast | gemma4:latest | gemma | 8192 | 0.100000 | 0.300000 |
privaterouter/qwen-fast | qwen3:latest | qwen | 32768 | 0.100000 | 0.300000 |
privaterouter/qwen-pro | qwen3.6:latest | qwen | 32768 | 0.500000 | 1.500000 |
privaterouter/deepseek-code | qwen3-coder:latest | deepseek | 32768 | 0.600000 | 1.800000 |
privaterouter/deepseek-reason | deepseek-r1:latest | deepseek | 32768 | 0.300000 | 0.900000 |
privaterouter/embed | nomic-embed-text (768d) | embedding | 8192 | 0.050000 | 0.000000 |
Prices are USD per million tokens, returned as decimal strings by the API to avoid float drift.
List what your key can actually see:
curl -sS http://localhost:8000/v1/models -H "Authorization: Bearer $PR_KEY"
The list is filtered against your plan's allow-list.
Plan reference
| Plan | Monthly | Included credits | Allowed models | RPM |
|---|---|---|---|---|
free | $0 | $1.00 | fast, embed | 20 |
starter | $10 | $12.00 | fast, qwen-fast, embed | 60 |
pro | $20 | $25.00 | fast, qwen-fast, qwen-pro, deepseek-code, embed | 120 |
developer | $49 | $65.00 | all | 300 |
team | $199 | $275.00 | all | 1000 |
See pricing.md for full per-model math and fair-use limits.
Error reference
All gateway errors follow the canonical OpenAI shape:
{ "error": { "message": "...", "type": "...", "code": "..." } }
| HTTP | type | code | When |
|---|---|---|---|
| 400 | invalid_request_error | invalid_body | Missing/invalid JSON, missing model field |
| 401 | invalid_request_error | invalid_api_key | Missing/malformed/unknown/disabled sk-pr-... key |
| 402 | insufficient_quota | insufficient_credits | Balance below $0.001 — top up at /billing |
| 403 | invalid_request_error | model_not_allowed | Model not in your plan's allow-list |
| 404 | invalid_request_error | endpoint_not_supported | Calling an unimplemented /v1/* route |
| 503 | upstream_error | upstream_unconfigured | API key has no LiteLLM virtual key (re-create) |
Troubleshooting
A container won't start
Check port conflicts:
ss -tlnp | grep -E ':(3000|8000|4010|5434|6380)\b'
If something else owns one of those ports, stop it or remap in docker-compose.yml.
401 invalid_api_key on /v1/*
Three common causes:
- You passed the JWT instead of the
sk-pr-...key./v1/*only accepts API keys. - You disabled the key in the dashboard.
- You copy-pasted with whitespace; the key must start with
sk-pr-.
402 insufficient_credits
Visit /billing or curl POST /api/billing/topup. Balance is checked pre-flight with a $0.001 floor.
Balance not updating after a request
Deduction runs in a BackgroundTasks callback after the response is fully sent. Wait ~1s and re-check /api/billing/balance. Failed upstream calls don't charge but still write an event with status="error".
JWT expired / 401 on /api/*
JWTs default to 1 hour. Re-run POST /api/auth/login or refresh the dashboard page (cookies are renewed on each request).
LiteLLM can't reach Ollama
curl -fsS http://192.168.6.55:11434/api/tags
Should list models. If not: confirm Ollama is up and bound to 0.0.0.0, and confirm the listed models include qwen3, qwen3.6, qwen3-coder, deepseek-r1, gemma4, nomic-embed-text.
Alembic migration failed
docker compose logs postgres | tail -50
docker compose exec api alembic current
docker compose exec api alembic upgrade head
If Postgres just started, give it a few seconds and retry. To nuke dev data: docker compose down -v.
model_not_allowed even though I'm on the right plan
Check your Account.plan_id via the dashboard /billing page. New accounts default to free, which only allows privaterouter/fast and privaterouter/embed. Plan upgrades are admin-driven in M2 (self-serve plan changes are scoped to M5 with Stripe).