Private AI Routing
for Builders.
One OpenAI-compatible API, ten open models, encrypted document RAG, live web search, and a built-in chat — all running on GPUs we own. Pay with Bitcoin, ship in two lines.
One catalog. 5 open models live, 3 more incoming.
Every model below is self-hosted on PrivateRouter GPUs. No closed-model resale, no data sent to OpenAI or Anthropic. Pay per token, point your OpenAI SDK at privaterouter.com/v1, and ship.
DeepSeek-OCR — token-efficient document understanding via Contexts Optical Compression. Multilingual, MIT-licensed.
Gemma3 12B — Google's open weights with vision and broad knowledge.
OpenAI gpt-oss 120B — flagship Apache-2.0 MoE (~5B active). Configurable reasoning effort, native tool use, 131K context.
Llama 3.1 8B — 128K context window for documents and long sessions.
Qwen3 Coder 30B — fast, accurate code generation across 100+ languages.
OpenAI gpt-oss 20B — Apache-2.0 MoE with native MXFP4. Fast, agentic, tool-use ready. 131K context.
Final benchmarking in progress. Launching once we lock in the MXFP4 throughput targets.
Kimi K2.5 — 1T-parameter MoE flagship (32B active) in NVIDIA's NVFP4 quantization. Self-hosted on Blackwell, lossless vs BF16 on coding & math benchmarks.
Reserved on Blackwell GPU capacity (NVFP4 build). Get notified the moment it goes live.
Qwen3 32B — deep reasoning, long-form planning, and tool use.
Reserved for an upcoming GPU capacity tier. Get notified when this model goes live.
Free to start
Try every model with $1 of free credit.
No card required. Sign up takes 30 seconds and you keep your credit forever.
What sets us apart
Built for grounded, agentic AI.
Three things every serious AI workload needs — your own data, the live web, and resilient model routing — wired into the same OpenAI-compatible endpoint.
-d '{"plugins":[{"id":"docs"}], ...}'
Encrypted document RAG
that just works.
Drop a PDF, DOCX, Markdown, or TXT file into your library. We chunk it, embed it with nomic-embed, store ciphertext in Postgres + pgvector, and from then on the chat grounds every answer in your own material — with bracket-numbered citations the model is told to cite.
- ChaCha20-Poly1305 encrypted at rest with your per-user key
- 50 MB free, then $0.02/GB/day — pay only for what you store
- Per-request
doc_idswhitelist for agent workflows
Live web search,
privacy-first.
Enable the Web Search plugin and every chat completion fans out to our self-hosted BitSeek backend, injecting fresh results as a system message before the model thinks. Your queries never go to Google, Bing, or any third-party tracker.
- Freshness filter: day · week · month · year
- News-mode endpoint for current-events grounding
- $0.000005/search — billed only on success
On timeout we walk the chain. Once the upstream returns response headers, we commit — no mid-stream failover.
Smart auto-routing
across the fleet.
Point at privaterouter/auto and we pick the cheapest healthy model that fits the job. Pass routing_tier to nudge the family — code, fast, or quality — and we still handle GPU outages with a per-attempt timeout walk.
- Connect-timeout failover before any tokens stream
- Per-user policy chain saved in your account
- Real-time health checks dictate the order
The platform
Everything you need to ship private AI.
Eighteen production features. One self-hosted stack. No closed-model resale, no surprise bills, no training on your prompts.
Drop-in compatible
Two lines.
You're live.
We speak the OpenAI dialect. Swap base_url and your existing SDK code, agent framework, or LangChain pipeline just works — against open models we host ourselves.
- Streaming, function-calling, JSON mode — all of it, the way you already wrote it.
- Works with the official OpenAI SDKs in Python, Node, Go, and the cURL command-line.
from openai import OpenAI
client = OpenAI(
base_url="https://api.privaterouter.com/v1",
api_key="pr-...",
)
stream = client.chat.completions.create(
model="privaterouter/auto",
messages=[
{"role": "user", "content": "Explain Bitcoin to a 5-year-old."}
],
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")OpenAI SDK compatible
Grab an API key. Ship in two lines.
Swap base_url to https://privaterouter.com/v1 and your existing SDK code, agent framework, or LangChain pipeline just works.
Embed anywhere
One script tag. Private AI on any site.
Drop the loader into your HTML. Visitors get a private-model chat bubble. You keep full control of the key, the persona, and the spend cap.
<script
src="https://privaterouter.com/widget/v1.js"
data-pr-widget="wgt_pr_xxxxxxxxxxxx"
data-pr-position="bottom-right"
async></script>- ▸ Public widget keys are scoped — they can't drain your account.
- ▸ Per-domain rate limits and origin pinning enforced server-side.
- ▸ Set a daily spend cap. The widget shuts off automatically when hit.
Run them anywhere — managed VPS for AI agents
PrivateRouter provides the AI brains. PureVPS provides the body. Together they're an agent.
Run Hermes Agent on a managed VPS — preconfigured for PrivateRouter API
Self-hosted OpenClaw coding agent on dedicated compute
OpenHands autonomous agent — preinstalled & ready
Browser-based AI chat — no install
A full ChatGPT-style chat, private and self-hosted.
Streaming responses, model switching, saved prompts, conversation history. All running against PrivateRouter's open models — no API key required to test, no data sent to OpenAI or Anthropic.
Free $1 credit on signup · No card required · All 8 models unlocked
Why PrivateRouter
Built for builders who need fast, honest, private inference.
Self-hosted open models
Qwen, Gemma, DeepSeek — running on infrastructure we control.
No closed-model resale
We do not proxy OpenAI or Anthropic. Open weights only.
Predictable pricing
Transparent per-token rates with no surprise overage fees.
Agent-ready APIs
OpenAI-compatible endpoints. Drop-in for your existing tooling.
Privacy-first infrastructure
Prompts are not retained for training. Logs minimized by default.
PureVPS agent hosting
Co-locate your agents next to the models. Sub-millisecond hops.
Frequently asked
Questions, answered.
What is PrivateRouter?
Is PrivateRouter OpenAI-compatible?
Which models can I use?
How does pricing work?
Do you log my prompts?
Can I pay with Bitcoin?
Where are the models hosted?
Can I run agents on PrivateRouter?
Is there a free trial?
Ship private AI today.
Sign up, grab an API key, point your OpenAI SDK at our base URL. $1 of free credit, all 8 open models, no card required.