PrivateRouter API Quickstart (M2 dev)

Zero to first OpenAI-compatible request in about five minutes, using per-user API keys and the /v1/* gateway that lands in Milestone 2.

In M2 you never talk to LiteLLM directly. You talk to FastAPI at http://localhost:8000/v1, authenticated by an sk-pr-... API key minted from the dashboard or the /api/keys endpoint. FastAPI checks your balance, forwards the request to LiteLLM using a per-user virtual key, streams the response back, and records a usage event + deducts credits in the background.

Step 1 — Start the stack

git clone <repo-url> privaterouter
cd privaterouter

cp .env.example .env
# Edit .env and set strong values for:
#   APP_SECRET_KEY, JWT_SECRET, LITELLM_MASTER_KEY, LITELLM_SALT_KEY,
#   POSTGRES_PASSWORD

docker compose up -d

On first boot the api container automatically runs alembic upgrade head and then python -m scripts.seed, which upserts the 5 plans and 6 models. Subsequent restarts converge idempotently.

Sanity check:

docker compose ps
curl -fsS http://localhost:8000/health
curl -fsS http://127.0.0.1:4010/health/readiness

Both should return {"status": "ok"} (or healthy).

Step 2 — Sign up

Option A: dashboard (preferred)

Open http://localhost:3000/signup in a browser, fill in email + password, and submit. The dashboard sets an httpOnly cookie and drops you on /dashboard. An Account row is created automatically and you're put on the free plan ($1.00 of starter credits).

Option B: curl

curl -sS -X POST http://localhost:8000/api/auth/signup \
  -H 'Content-Type: application/json' \
  -d '{
    "email": "alice@example.com",
    "password": "correct-horse-battery-staple",
    "name": "Alice"
  }'

Response shape:

{
  "access_token": "eyJhbG...VCJ9...",
  "token_type": "bearer",
  "user": {
    "id": "8f2b1c4e-1a3d-4f5e-9b6a-2c0d4e7f8a91",
    "email": "alice@example.com",
    "name": "Alice",
    "role": "user",
    "status": "active",
    "created_at": "2026-05-15T06:23:00.123456Z"
  }
}

Signup also sets the same pr_session httpOnly cookie that /login does, so you can pass it as either a Bearer header or by reusing the cookie jar.

Capture the JWT for the rest of this guide:

TOKEN=$(curl -sS -X POST http://localhost:8000/api/auth/login \
  -H 'Content-Type: application/json' \
  -d '{"email":"alice@example.com","password":"correct-horse-battery-staple"}' \
  | jq -r .access_token)

Verify it:

curl -sS http://localhost:8000/api/auth/me -H "Authorization: Bearer $TOKEN"

Step 3 — Top up credits

The free plan ships with $1.00 of credits, enough for ~4M tokens against privaterouter/fast. To play with bigger models, add more.

Via the dashboard

Visit http://localhost:3000/billing, click $10, and your balance updates instantly.

Via curl

curl -sS -X POST http://localhost:8000/api/billing/topup \
  -H "Authorization: Bearer $TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{"amount_usd": 10}'

Allowed amounts: 10, 25, 50, 100, 500 (USD). Response:

{
  "balance_usd": "11.00",
  "plan_name": "free",
  "monthly_quota_tokens": null
}

Mock topup. No card is charged in M2. Real Stripe Checkout lands in M5.

Step 4 — Create an API key

Via the dashboard

Go to http://localhost:3000/keys
Click Create new key, give it a name (e.g. laptop)
The sk-pr-... value is shown once — copy it now. The dashboard only stores the prefix and a hash; we cannot show it again.

Via curl

curl -sS -X POST http://localhost:8000/api/keys \
  -H "Authorization: Bearer $TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{"name": "laptop"}'

Response:

{
  "key": "sk-pr-live-7Q2f...g8vM",
  "api_key": {
    "id": "1c9e9a4f-...",
    "name": "laptop",
    "key_prefix": "sk-pr-live-7Q2f",
    "status": "active",
    "monthly_limit_usd": null,
    "last_used_at": null,
    "created_at": "2026-05-15T06:24:11.000000Z"
  }
}

The key field is the only place the full secret is returned. If you lose it, delete and re-create.

export PR_KEY="sk-pr-live-7Q2f...g8vM"

Step 5 — Your first inference request

Every example below points at http://localhost:8000/v1 and authenticates with sk-pr-.... The path is OpenAI-compatible — same shape as api.openai.com/v1/chat/completions.

curl

curl -sS http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer $PR_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "privaterouter/fast",
    "messages": [{"role": "user", "content": "Say hi in five words."}]
  }'

Python (`openai` SDK)

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="sk-pr-live-7Q2f...g8vM",
)

resp = client.chat.completions.create(
    model="privaterouter/fast",
    messages=[{"role": "user", "content": "Say hi in five words."}],
)
print(resp.choices[0].message.content)

Node (`openai` SDK)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:8000/v1",
  apiKey: "sk-pr-live-7Q2f...g8vM",
});

const resp = await client.chat.completions.create({
  model: "privaterouter/fast",
  messages: [{ role: "user", content: "Say hi in five words." }],
});
console.log(resp.choices[0].message.content);

You'll get a standard OpenAI chat.completion envelope. A UsageEvent row is written in the background and your balance is debited within a few hundred ms.

Step 6 — Streaming

Add "stream": true and consume the SSE event stream. Use -N with curl so it doesn't buffer:

curl -N -sS http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer $PR_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "privaterouter/qwen-fast",
    "stream": true,
    "messages": [{"role": "user", "content": "Write one haiku about TCP."}]
  }'

Output is the standard OpenAI SSE format:

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hand"},"index":0}], ...}
data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"shake"},"index":0}], ...}
...
data: [DONE]

Python:

stream = client.chat.completions.create(
    model="privaterouter/qwen-fast",
    messages=[{"role": "user", "content": "Stream a haiku."}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Usage tokens are captured from the final chunk's usage block (LiteLLM injects stream_options.include_usage=true for accurate billing).

Step 7 — Inspect your usage

Every /v1/* call writes a usage_event row in the background.

Via the dashboard

http://localhost:3000/usage shows a paginated table with timestamps, model, token counts, latency, and per-request cost.

Via curl

curl -sS "http://localhost:8000/api/usage?limit=20" \
  -H "Authorization: Bearer $TOKEN"

Response:

{
  "items": [
    {
      "id": "...",
      "model_public_name": "privaterouter/fast",
      "input_tokens": 14,
      "output_tokens": 9,
      "cost_usd": "0.0000041",
      "latency_ms": 412,
      "status": "success",
      "created_at": "2026-05-15T06:25:33Z"
    }
  ],
  "total": 1,
  "limit": 20,
  "offset": 0
}

Balance updates land in /api/billing/balance.

Step 8 — Embeddings

curl -sS http://localhost:8000/v1/embeddings \
  -H "Authorization: Bearer $PR_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "privaterouter/embed",
    "input": "Vectors are nice"
  }'

Returns an OpenAI-shaped envelope with a single 768-dim vector (nomic-embed-text). The embed model is on the free plan's allow-list, so it works out of the box.

Python:

emb = client.embeddings.create(
    model="privaterouter/embed",
    input="Vectors are nice",
)
print(len(emb.data[0].embedding))  # 768

Available models

Public name	Upstream model	Family	Context	Input $/M	Output $/M
`privaterouter/fast`	`gemma4:latest`	gemma	8192	`0.100000`	`0.300000`
`privaterouter/qwen-fast`	`qwen3:latest`	qwen	32768	`0.100000`	`0.300000`
`privaterouter/qwen-pro`	`qwen3.6:latest`	qwen	32768	`0.500000`	`1.500000`
`privaterouter/deepseek-code`	`qwen3-coder:latest`	deepseek	32768	`0.600000`	`1.800000`
`privaterouter/deepseek-reason`	`deepseek-r1:latest`	deepseek	32768	`0.300000`	`0.900000`
`privaterouter/embed`	`nomic-embed-text` (768d)	embedding	8192	`0.050000`	`0.000000`

Prices are USD per million tokens, returned as decimal strings by the API to avoid float drift.

List what your key can actually see:

curl -sS http://localhost:8000/v1/models -H "Authorization: Bearer $PR_KEY"

The list is filtered against your plan's allow-list.

Plan reference

Plan	Monthly	Included credits	Allowed models	RPM
`free`	$0	$1.00	`fast`, `embed`	20
`starter`	$10	$12.00	`fast`, `qwen-fast`, `embed`	60
`pro`	$20	$25.00	`fast`, `qwen-fast`, `qwen-pro`, `deepseek-code`, `embed`	120
`developer`	$49	$65.00	all	300
`team`	$199	$275.00	all	1000

See pricing.md for full per-model math and fair-use limits.

Error reference

All gateway errors follow the canonical OpenAI shape:

{ "error": { "message": "...", "type": "...", "code": "..." } }

HTTP	`type`	`code`	When
400	`invalid_request_error`	`invalid_body`	Missing/invalid JSON, missing `model` field
401	`invalid_request_error`	`invalid_api_key`	Missing/malformed/unknown/disabled `sk-pr-...` key
402	`insufficient_quota`	`insufficient_credits`	Balance below $0.001 — top up at `/billing`
403	`invalid_request_error`	`model_not_allowed`	Model not in your plan's allow-list
404	`invalid_request_error`	`endpoint_not_supported`	Calling an unimplemented `/v1/*` route
503	`upstream_error`	`upstream_unconfigured`	API key has no LiteLLM virtual key (re-create)

Troubleshooting

A container won't start

Check port conflicts:

ss -tlnp | grep -E ':(3000|8000|4010|5434|6380)\b'

If something else owns one of those ports, stop it or remap in docker-compose.yml.

`401 invalid_api_key` on /v1/*

Three common causes:

You passed the JWT instead of the sk-pr-... key. /v1/* only accepts API keys.
You disabled the key in the dashboard.
You copy-pasted with whitespace; the key must start with sk-pr-.

`402 insufficient_credits`

Visit /billing or curl POST /api/billing/topup. Balance is checked pre-flight with a $0.001 floor.

Balance not updating after a request

Deduction runs in a BackgroundTasks callback after the response is fully sent. Wait ~1s and re-check /api/billing/balance. Failed upstream calls don't charge but still write an event with status="error".

JWT expired / 401 on `/api/*`

JWTs default to 1 hour. Re-run POST /api/auth/login or refresh the dashboard page (cookies are renewed on each request).

LiteLLM can't reach Ollama

curl -fsS http://192.168.6.55:11434/api/tags

Should list models. If not: confirm Ollama is up and bound to 0.0.0.0, and confirm the listed models include qwen3, qwen3.6, qwen3-coder, deepseek-r1, gemma4, nomic-embed-text.

Alembic migration failed

docker compose logs postgres | tail -50
docker compose exec api alembic current
docker compose exec api alembic upgrade head

If Postgres just started, give it a few seconds and retry. To nuke dev data: docker compose down -v.

`model_not_allowed` even though I'm on the right plan

Check your Account.plan_id via the dashboard /billing page. New accounts default to free, which only allows privaterouter/fast and privaterouter/embed. Plan upgrades are admin-driven in M2 (self-serve plan changes are scoped to M5 with Stripe).

API quickstart

PrivateRouter API Quickstart (M2 dev)

Step 1 — Start the stack

Step 2 — Sign up

Option A: dashboard (preferred)

Option B: curl

Step 3 — Top up credits

Via the dashboard

Via curl

Step 4 — Create an API key

Via the dashboard

Via curl

Step 5 — Your first inference request

curl

Python (openai SDK)

Node (openai SDK)

Step 6 — Streaming

Step 7 — Inspect your usage

Via the dashboard

Via curl

Step 8 — Embeddings

Available models

Plan reference

Error reference

Troubleshooting

A container won't start

401 invalid_api_key on /v1/*

402 insufficient_credits

Balance not updating after a request

JWT expired / 401 on /api/*

LiteLLM can't reach Ollama

Alembic migration failed

model_not_allowed even though I'm on the right plan

Python (`openai` SDK)

Node (`openai` SDK)

`401 invalid_api_key` on /v1/*

`402 insufficient_credits`

JWT expired / 401 on `/api/*`

`model_not_allowed` even though I'm on the right plan