p50
—
p95
—
Success
—
Requests (7d)
—
$/1M in
$0.05
$/1M out
$0.20
Context
131k
Not enough traffic in the last 7 days to compute aggregate stats yet. Charts below show whatever hourly buckets we do have.
Latency (last 7 days, hourly)
p50 p95
Requests per hour (last 7 days)
Switch from OpenAI
Drop-in: change two lines, keep your code.
# Before:
# client = OpenAI()
# client.chat.completions.create(model="gpt-4o", ...)
#
# After:
from openai import OpenAI
client = OpenAI(
api_key="$PRIVATEROUTER_API_KEY",
base_url="https://api.privaterouter.com/v1",
)
resp = client.chat.completions.create(
model="privaterouter/translategemma-4b",
messages=[{"role": "user", "content": "Hello!"}],
)
print(resp.choices[0].message.content)Sample prompts
Explain transformers in two sentences.
Transformers use self-attention to weigh how every token in a sequence relates to every other token, all in parallel. That replaces the slow recurrence of RNNs with a single big matrix multiply, which is why they scale so well on GPUs.
Write a regex that matches a US ZIP code.
^\d{5}(-\d{4})?$