Back to docs

API quickstart

OpenAI-compatible /v1 endpoint usage

PrivateRouter API Quickstart (M2 dev)

Zero to first OpenAI-compatible request in about five minutes, using per-user API keys and the /v1/* gateway that lands in Milestone 2.

In M2 you never talk to LiteLLM directly. You talk to FastAPI at http://localhost:8000/v1, authenticated by an sk-pr-... API key minted from the dashboard or the /api/keys endpoint. FastAPI checks your balance, forwards the request to LiteLLM using a per-user virtual key, streams the response back, and records a usage event + deducts credits in the background.


Step 1 — Start the stack

git clone <repo-url> privaterouter
cd privaterouter

cp .env.example .env
# Edit .env and set strong values for:
#   APP_SECRET_KEY, JWT_SECRET, LITELLM_MASTER_KEY, LITELLM_SALT_KEY,
#   POSTGRES_PASSWORD

docker compose up -d

On first boot the api container automatically runs alembic upgrade head and then python -m scripts.seed, which upserts the 5 plans and 6 models. Subsequent restarts converge idempotently.

Sanity check:

docker compose ps
curl -fsS http://localhost:8000/health
curl -fsS http://127.0.0.1:4010/health/readiness

Both should return {"status": "ok"} (or healthy).


Step 2 — Sign up

Option A: dashboard (preferred)

Open http://localhost:3000/signup in a browser, fill in email + password, and submit. The dashboard sets an httpOnly cookie and drops you on /dashboard. An Account row is created automatically and you're put on the free plan ($1.00 of starter credits).

Option B: curl

curl -sS -X POST http://localhost:8000/api/auth/signup \
  -H 'Content-Type: application/json' \
  -d '{
    "email": "alice@example.com",
    "password": "correct-horse-battery-staple",
    "name": "Alice"
  }'

Response shape:

{
  "access_token": "eyJhbG...VCJ9...",
  "token_type": "bearer",
  "user": {
    "id": "8f2b1c4e-1a3d-4f5e-9b6a-2c0d4e7f8a91",
    "email": "alice@example.com",
    "name": "Alice",
    "role": "user",
    "status": "active",
    "created_at": "2026-05-15T06:23:00.123456Z"
  }
}

Signup also sets the same pr_session httpOnly cookie that /login does, so you can pass it as either a Bearer header or by reusing the cookie jar.

Capture the JWT for the rest of this guide:

TOKEN=$(curl -sS -X POST http://localhost:8000/api/auth/login \
  -H 'Content-Type: application/json' \
  -d '{"email":"alice@example.com","password":"correct-horse-battery-staple"}' \
  | jq -r .access_token)

Verify it:

curl -sS http://localhost:8000/api/auth/me -H "Authorization: Bearer $TOKEN"

Step 3 — Top up credits

The free plan ships with $1.00 of credits, enough for ~4M tokens against privaterouter/fast. To play with bigger models, add more.

Via the dashboard

Visit http://localhost:3000/billing, click $10, and your balance updates instantly.

Via curl

curl -sS -X POST http://localhost:8000/api/billing/topup \
  -H "Authorization: Bearer $TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{"amount_usd": 10}'

Allowed amounts: 10, 25, 50, 100, 500 (USD). Response:

{
  "balance_usd": "11.00",
  "plan_name": "free",
  "monthly_quota_tokens": null
}

Mock topup. No card is charged in M2. Real Stripe Checkout lands in M5.


Step 4 — Create an API key

Via the dashboard

  1. Go to http://localhost:3000/keys
  2. Click Create new key, give it a name (e.g. laptop)
  3. The sk-pr-... value is shown once — copy it now. The dashboard only stores the prefix and a hash; we cannot show it again.

Via curl

curl -sS -X POST http://localhost:8000/api/keys \
  -H "Authorization: Bearer $TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{"name": "laptop"}'

Response:

{
  "key": "sk-pr-live-7Q2f...g8vM",
  "api_key": {
    "id": "1c9e9a4f-...",
    "name": "laptop",
    "key_prefix": "sk-pr-live-7Q2f",
    "status": "active",
    "monthly_limit_usd": null,
    "last_used_at": null,
    "created_at": "2026-05-15T06:24:11.000000Z"
  }
}

The key field is the only place the full secret is returned. If you lose it, delete and re-create.

export PR_KEY="sk-pr-live-7Q2f...g8vM"

Step 5 — Your first inference request

Every example below points at http://localhost:8000/v1 and authenticates with sk-pr-.... The path is OpenAI-compatible — same shape as api.openai.com/v1/chat/completions.

curl

curl -sS http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer $PR_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "privaterouter/fast",
    "messages": [{"role": "user", "content": "Say hi in five words."}]
  }'

Python (openai SDK)

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="sk-pr-live-7Q2f...g8vM",
)

resp = client.chat.completions.create(
    model="privaterouter/fast",
    messages=[{"role": "user", "content": "Say hi in five words."}],
)
print(resp.choices[0].message.content)

Node (openai SDK)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:8000/v1",
  apiKey: "sk-pr-live-7Q2f...g8vM",
});

const resp = await client.chat.completions.create({
  model: "privaterouter/fast",
  messages: [{ role: "user", content: "Say hi in five words." }],
});
console.log(resp.choices[0].message.content);

You'll get a standard OpenAI chat.completion envelope. A UsageEvent row is written in the background and your balance is debited within a few hundred ms.


Step 6 — Streaming

Add "stream": true and consume the SSE event stream. Use -N with curl so it doesn't buffer:

curl -N -sS http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer $PR_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "privaterouter/qwen-fast",
    "stream": true,
    "messages": [{"role": "user", "content": "Write one haiku about TCP."}]
  }'

Output is the standard OpenAI SSE format:

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hand"},"index":0}], ...}
data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"shake"},"index":0}], ...}
...
data: [DONE]

Python:

stream = client.chat.completions.create(
    model="privaterouter/qwen-fast",
    messages=[{"role": "user", "content": "Stream a haiku."}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Usage tokens are captured from the final chunk's usage block (LiteLLM injects stream_options.include_usage=true for accurate billing).


Step 7 — Inspect your usage

Every /v1/* call writes a usage_event row in the background.

Via the dashboard

http://localhost:3000/usage shows a paginated table with timestamps, model, token counts, latency, and per-request cost.

Via curl

curl -sS "http://localhost:8000/api/usage?limit=20" \
  -H "Authorization: Bearer $TOKEN"

Response:

{
  "items": [
    {
      "id": "...",
      "model_public_name": "privaterouter/fast",
      "input_tokens": 14,
      "output_tokens": 9,
      "cost_usd": "0.0000041",
      "latency_ms": 412,
      "status": "success",
      "created_at": "2026-05-15T06:25:33Z"
    }
  ],
  "total": 1,
  "limit": 20,
  "offset": 0
}

Balance updates land in /api/billing/balance.


Step 8 — Embeddings

curl -sS http://localhost:8000/v1/embeddings \
  -H "Authorization: Bearer $PR_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "privaterouter/embed",
    "input": "Vectors are nice"
  }'

Returns an OpenAI-shaped envelope with a single 768-dim vector (nomic-embed-text). The embed model is on the free plan's allow-list, so it works out of the box.

Python:

emb = client.embeddings.create(
    model="privaterouter/embed",
    input="Vectors are nice",
)
print(len(emb.data[0].embedding))  # 768

Available models

Public nameUpstream modelFamilyContextInput $/MOutput $/M
privaterouter/fastgemma4:latestgemma81920.1000000.300000
privaterouter/qwen-fastqwen3:latestqwen327680.1000000.300000
privaterouter/qwen-proqwen3.6:latestqwen327680.5000001.500000
privaterouter/deepseek-codeqwen3-coder:latestdeepseek327680.6000001.800000
privaterouter/deepseek-reasondeepseek-r1:latestdeepseek327680.3000000.900000
privaterouter/embednomic-embed-text (768d)embedding81920.0500000.000000

Prices are USD per million tokens, returned as decimal strings by the API to avoid float drift.

List what your key can actually see:

curl -sS http://localhost:8000/v1/models -H "Authorization: Bearer $PR_KEY"

The list is filtered against your plan's allow-list.


Plan reference

PlanMonthlyIncluded creditsAllowed modelsRPM
free$0$1.00fast, embed20
starter$10$12.00fast, qwen-fast, embed60
pro$20$25.00fast, qwen-fast, qwen-pro, deepseek-code, embed120
developer$49$65.00all300
team$199$275.00all1000

See pricing.md for full per-model math and fair-use limits.


Error reference

All gateway errors follow the canonical OpenAI shape:

{ "error": { "message": "...", "type": "...", "code": "..." } }
HTTPtypecodeWhen
400invalid_request_errorinvalid_bodyMissing/invalid JSON, missing model field
401invalid_request_errorinvalid_api_keyMissing/malformed/unknown/disabled sk-pr-... key
402insufficient_quotainsufficient_creditsBalance below $0.001 — top up at /billing
403invalid_request_errormodel_not_allowedModel not in your plan's allow-list
404invalid_request_errorendpoint_not_supportedCalling an unimplemented /v1/* route
503upstream_errorupstream_unconfiguredAPI key has no LiteLLM virtual key (re-create)

Troubleshooting

A container won't start

Check port conflicts:

ss -tlnp | grep -E ':(3000|8000|4010|5434|6380)\b'

If something else owns one of those ports, stop it or remap in docker-compose.yml.

401 invalid_api_key on /v1/*

Three common causes:

  1. You passed the JWT instead of the sk-pr-... key. /v1/* only accepts API keys.
  2. You disabled the key in the dashboard.
  3. You copy-pasted with whitespace; the key must start with sk-pr-.

402 insufficient_credits

Visit /billing or curl POST /api/billing/topup. Balance is checked pre-flight with a $0.001 floor.

Balance not updating after a request

Deduction runs in a BackgroundTasks callback after the response is fully sent. Wait ~1s and re-check /api/billing/balance. Failed upstream calls don't charge but still write an event with status="error".

JWT expired / 401 on /api/*

JWTs default to 1 hour. Re-run POST /api/auth/login or refresh the dashboard page (cookies are renewed on each request).

LiteLLM can't reach Ollama

curl -fsS http://192.168.6.55:11434/api/tags

Should list models. If not: confirm Ollama is up and bound to 0.0.0.0, and confirm the listed models include qwen3, qwen3.6, qwen3-coder, deepseek-r1, gemma4, nomic-embed-text.

Alembic migration failed

docker compose logs postgres | tail -50
docker compose exec api alembic current
docker compose exec api alembic upgrade head

If Postgres just started, give it a few seconds and retry. To nuke dev data: docker compose down -v.

model_not_allowed even though I'm on the right plan

Check your Account.plan_id via the dashboard /billing page. New accounts default to free, which only allows privaterouter/fast and privaterouter/embed. Plan upgrades are admin-driven in M2 (self-serve plan changes are scoped to M5 with Stripe).