Self-hosted open models · RAG + Web Search built in · No closed-model resale

Private AI Routing
for Builders.

One OpenAI-compatible API, ten open models, encrypted document RAG, live web search, and a built-in chat — all running on GPUs we own. Pay with Bitcoin, ship in two lines.

OpenAI-compatibleDocument RAGWeb SearchSelf-hostedBitcoin payments
Migrating from OpenAI? See the 5-minute switch guide

One catalog. 5 open models live, 3 more incoming.

Every model below is self-hosted on PrivateRouter GPUs. No closed-model resale, no data sent to OpenAI or Anthropic. Pay per token, point your OpenAI SDK at privaterouter.com/v1, and ship.

OCR
3B8K ctx
privaterouter/deepseek-ocr

DeepSeek-OCR — token-efficient document understanding via Contexts Optical Compression. Multilingual, MIT-licensed.

in $0.20/Mout $0.80/M
Multimodal
12B8K ctx
privaterouter/gemma3

Gemma3 12B — Google's open weights with vision and broad knowledge.

in $0.30/Mout $0.60/M
OpenAI
117B131K ctx
privaterouter/gpt-oss-120b

OpenAI gpt-oss 120B — flagship Apache-2.0 MoE (~5B active). Configurable reasoning effort, native tool use, 131K context.

in $0.40/Mout $1.20/M
Long Context
8B128K ctx
privaterouter/llama3.1

Llama 3.1 8B — 128K context window for documents and long sessions.

in $0.20/Mout $0.40/M
Coding
30B33K ctx
privaterouter/qwen3-coder

Qwen3 Coder 30B — fast, accurate code generation across 100+ languages.

in $0.60/Mout $1.80/M
OpenAI
Coming soon
privaterouter/gpt-oss-20b

OpenAI gpt-oss 20B — Apache-2.0 MoE with native MXFP4. Fast, agentic, tool-use ready. 131K context.

21B · 131K ctx

Final benchmarking in progress. Launching once we lock in the MXFP4 throughput targets.

Frontier
Coming soon
privaterouter/kimi-k2.5

Kimi K2.5 — 1T-parameter MoE flagship (32B active) in NVIDIA's NVFP4 quantization. Self-hosted on Blackwell, lossless vs BF16 on coding & math benchmarks.

1000B · 256K ctx

Reserved on Blackwell GPU capacity (NVFP4 build). Get notified the moment it goes live.

Flagship
Coming soon
privaterouter/qwen3-32b

Qwen3 32B — deep reasoning, long-form planning, and tool use.

36B · 33K ctx

Reserved for an upcoming GPU capacity tier. Get notified when this model goes live.

Live catalogPrices in USD per million tokensOpenAI-compatible API
8
models live
requests served (7d)
median latency
100%
open-source

Free to start

Try every model with $1 of free credit.

No card required. Sign up takes 30 seconds and you keep your credit forever.

What sets us apart

Built for grounded, agentic AI.

Three things every serious AI workload needs — your own data, the live web, and resilient model routing — wired into the same OpenAI-compatible endpoint.

Your library
company-handbook.pdf
47 chunksready
api-contracts.docx
23 chunksready
design-system.md
12 chunksembedding
Q4-roadmap.pdf
31 chunksready
$ curl /v1/chat/completions\
-d '{"plugins":[{"id":"docs"}], ...}'

Encrypted document RAG
that just works.

Drop a PDF, DOCX, Markdown, or TXT file into your library. We chunk it, embed it with nomic-embed, store ciphertext in Postgres + pgvector, and from then on the chat grounds every answer in your own material — with bracket-numbered citations the model is told to cite.

  • ChaCha20-Poly1305 encrypted at rest with your per-user key
  • 50 MB free, then $0.02/GB/day — pay only for what you store
  • Per-request doc_ids whitelist for agent workflows
Read the docs

Live web search,
privacy-first.

Enable the Web Search plugin and every chat completion fans out to our self-hosted BitSeek backend, injecting fresh results as a system message before the model thinks. Your queries never go to Google, Bing, or any third-party tracker.

  • Freshness filter: day · week · month · year
  • News-mode endpoint for current-events grounding
  • $0.000005/search — billed only on success
See plugin config
What the model sees
// system: web search results
[1] Anthropic raises $5B at $60B valuation
bloomberg.com · 2h ago
[2] JPMorgan deploys AI tools to 200k staff
reuters.com · 5h ago
[3] Qwen3-32B benchmarks beat Llama 3.1 70B
arxiv.org · 1d ago
Model reply: JPMorgan rolled out Suite AI to 200K employees this week [2], hot on the heels of Anthropic's funding round [1]
One alias → smart fallback
model: "privaterouter/auto"
routing_tier: "code"
1.
privaterouter/qwen3-coder
primary
2.
privaterouter/deepseek-r1
fallback 1
3.
privaterouter/qwen3-32b
fallback 2

On timeout we walk the chain. Once the upstream returns response headers, we commit — no mid-stream failover.

Smart auto-routing
across the fleet.

Point at privaterouter/auto and we pick the cheapest healthy model that fits the job. Pass routing_tier to nudge the family — code, fast, or quality — and we still handle GPU outages with a per-attempt timeout walk.

  • Connect-timeout failover before any tokens stream
  • Per-user policy chain saved in your account
  • Real-time health checks dictate the order
How routing works

The platform

Everything you need to ship private AI.

Eighteen production features. One self-hosted stack. No closed-model resale, no surprise bills, no training on your prompts.

Document Library (RAG)

Upload PDFs/DOCX/MD/TXT once — encrypted, embedded, indexed. The chat auto-grounds answers in your docs with bracket-numbered citations.

Web Search Grounding

Live web search injected into every chat completion. Self-hosted BitSeek backend — your queries never touch Google or Bing.

Response Healing

JSON-mode auto-repair when the model returns malformed output. Zero retries needed for structured-data pipelines.

OpenAI-Compatible API

Drop-in /v1/chat/completions and /v1/embeddings. Point your SDK at our base URL and ship.

Web Chat Interface

Streaming chat at chat.privaterouter.com with model switching, saved prompts, and history.

Embeddable Widget

A single <script> tag drops a private-AI chat bubble onto any site. Bring your own key.

Model Playground

Compare answers side-by-side across Qwen, Gemma, DeepSeek before wiring up production.

Auto-Router

One model alias, smart routing across our fleet. Optimised for latency, cost, or quality.

Per-Key Analytics

Token spend, latency percentiles, and error rates broken down per API key in real time.

Public Leaderboard

Live rankings of the open models we host — by latency, throughput, and quality.

Saved Prompts

Reusable prompt templates and personas. Version them, share them, ship them.

Privacy Mode

Zero-retention prompts and responses. Logs minimised, never used for training.

OpenAI Migration

5-minute switch guide. Swap base_url, keep your code. Side-by-side benchmark included.

Spend Alerts

Daily / weekly thresholds and webhook fires per key so a runaway agent never bankrupts you.

Bitcoin Payments

On-chain and Lightning top-ups. No card on file. Buy credits, burn them, repeat.

Agent Hosting (PureVPS)

Managed VPS for Hermes, OpenClaw, OpenHands, and browser agents — preconfigured for our API.

Teams & Workspaces

Shared credits, role-based access, and audit trails. Bring your team without billing chaos.

Live Status & Uptime

Real-time GPU node health, latency, and incident history. Subscribe to per-component alerts.

Drop-in compatible

Two lines.
You're live.

We speak the OpenAI dialect. Swap base_url and your existing SDK code, agent framework, or LangChain pipeline just works — against open models we host ourselves.

  • Streaming, function-calling, JSON mode — all of it, the way you already wrote it.
  • Works with the official OpenAI SDKs in Python, Node, Go, and the cURL command-line.
from openai import OpenAI

client = OpenAI(
    base_url="https://api.privaterouter.com/v1",
    api_key="pr-...",
)

stream = client.chat.completions.create(
    model="privaterouter/auto",
    messages=[
        {"role": "user", "content": "Explain Bitcoin to a 5-year-old."}
    ],
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

OpenAI SDK compatible

Grab an API key. Ship in two lines.

Swap base_url to https://privaterouter.com/v1 and your existing SDK code, agent framework, or LangChain pipeline just works.

Embed anywhere

One script tag. Private AI on any site.

Drop the loader into your HTML. Visitors get a private-model chat bubble. You keep full control of the key, the persona, and the spend cap.

paste this into your <head>
<script
  src="https://privaterouter.com/widget/v1.js"
  data-pr-widget="wgt_pr_xxxxxxxxxxxx"
  data-pr-position="bottom-right"
  async></script>
  • Public widget keys are scoped — they can't drain your account.
  • Per-domain rate limits and origin pinning enforced server-side.
  • Set a daily spend cap. The widget shuts off automatically when hit.
PrivateRouter
online · open model
live preview
👋 Hey! I'm the PrivateRouter widget — running on an open model, self-hosted. Ask me anything.
What models do you support?
Qwen, Gemma, DeepSeek and a few others — all open weights, all hosted on infrastructure we control. No proxying to OpenAI.
Can I pay with Bitcoin?
Yep. On-chain or Lightning — top up credits, no card required.
powered by privaterouter · self-hosted open models

Pay per use — no subscriptions

Pay only for the tokens you use.

Top up once. Burn credits at per-model token rates. No monthly bill, no auto-renewals, no surprise charges — credits never expire. Start with $1 of free credit, no card required.

Free tier
$1 credit on signup · all 8 models · 20 RPM

No card required. Test every model end-to-end before you spend a cent.

Start free
Indie
$10credit

Kick the tires

Top up
Builder
$25credit

Ship a side project

Top up
Most chosen
Team
$50credit

Run agents for weeks

Top up
Scale
$100credit

Production workloads

Top up
Enterprise
$500credit

Volume discount

Top up
Credits never expire
Hard spend caps stop runaway agents
All 8 models, same price for everyone

Browser-based AI chat — no install

A full ChatGPT-style chat, private and self-hosted.

Streaming responses, model switching, saved prompts, conversation history. All running against PrivateRouter's open models — no API key required to test, no data sent to OpenAI or Anthropic.

privaterouter.com/chat
Demo
New conversation
qwen3-coder · 30B
Write a Python function that reverses a linked list iteratively.
Streaming responsesSwitch models mid-threadConversation historyPrivacy mode

Free $1 credit on signup · No card required · All 8 models unlocked

Why PrivateRouter

Built for builders who need fast, honest, private inference.

Self-hosted open models

Qwen, Gemma, DeepSeek — running on infrastructure we control.

No closed-model resale

We do not proxy OpenAI or Anthropic. Open weights only.

Predictable pricing

Transparent per-token rates with no surprise overage fees.

Agent-ready APIs

OpenAI-compatible endpoints. Drop-in for your existing tooling.

Privacy-first infrastructure

Prompts are not retained for training. Logs minimized by default.

PureVPS agent hosting

Co-locate your agents next to the models. Sub-millisecond hops.

Frequently asked

Questions, answered.

What is PrivateRouter?
PrivateRouter is a privacy-first AI inference platform. We host open-weight models (Qwen, DeepSeek, Llama, Gemma, Nomic-Embed) on our own GPUs and expose them through an OpenAI-compatible API, a streaming web chat, and an embeddable widget. There's no closed-model resale — we're not a wrapper over OpenAI or Anthropic.
Is PrivateRouter OpenAI-compatible?
Yes. Our /v1/chat/completions and /v1/embeddings endpoints accept the standard OpenAI request shape. To migrate, change your SDK's base_url to https://privaterouter.com/v1 and swap your API key for an sk-pr-... key. Existing OpenAI SDK code, agent frameworks, and LangChain pipelines work without changes.
Which models can I use?
Every account gets access to every model — all 8 of them: Qwen3 (8B / 32B), Qwen3 Coder 30B, DeepSeek-R1, Gemma3 12B, Gemma4, Llama 3.1 8B (128K context), and Nomic-Embed-Text for embeddings. Same catalog whether you're on the free tier or topping up $500 at a time.
How does pricing work?
Pure pay-per-use, like OpenRouter. Per-token pricing in USD per million tokens, billed against prepaid credits. No subscriptions, no monthly fees. Sign up gets you $1 of free credit; top up from $10 to $500 when you're ready. Credits never expire. There are no overage charges — your spend caps at your remaining balance.
Do you log my prompts?
By default we keep minimal request metadata (timestamps, token counts, latency) to bill accurately and serve our public leaderboard. Enable Privacy Mode on a per-key basis for zero-retention prompts and responses. We never train on customer prompts.
Can I pay with Bitcoin?
Yes — on-chain Bitcoin top-ups via our native payment module. Buy credit packs from $5 to $500, no card required. Stripe and credit cards also supported for one-time top-ups.
Where are the models hosted?
On rented GPU nodes (currently Vast.ai consumer/datacenter GPUs). The autoscaler matches capacity to demand — cheaper consumer GPUs at low volume, scaling up to enterprise GPUs as load grows. No data leaves our infrastructure to closed-model providers.
Can I run agents on PrivateRouter?
Yes. Our PureVPS Agent Hosting packages come preconfigured for Hermes, OpenClaw, OpenHands, and browser agents — with PrivateRouter API credits bundled in. Or use our API directly from your own infrastructure.
Is there a free trial?
Yes. Sign up gets you $1 of API credit and access to every model — enough to try them all end-to-end with no card on file. Top up only when you're ready.

Ship private AI today.

Sign up, grab an API key, point your OpenAI SDK at our base URL. $1 of free credit, all 8 open models, no card required.