Self-hosted open models · RAG + Web Search built in · No closed-model resale

Private AI Routing
for Builders.

One OpenAI-compatible API, ten open models, encrypted document RAG, live web search, and a built-in chat — all running on GPUs we own. Pay with Bitcoin, ship in two lines.

Start Building View Models

OpenAI-compatible·Document RAG·Web Search·Self-hosted·Bitcoin payments

Migrating from OpenAI? See the 5-minute switch guide

One catalog. 5 open models live, 2 more incoming.

Every model below is self-hosted on PrivateRouter GPUs. No closed-model resale, no data sent to OpenAI or Anthropic. Pay per token, point your OpenAI SDK at privaterouter.com/v1, and ship.

OCR

3B8K ctx

privaterouter/deepseek-ocr

DeepSeek-OCR — token-efficient document understanding via Contexts Optical Compression. Multilingual, MIT-licensed.

in $0.20/Mout $0.80/M

Agentic

31B66K ctx

privaterouter/gemma4-31b

Gemma 4 31B — Google's next-gen dense model, tuned for agentic tasks, tool use, and structured output. 32K context.

in $0.40/Mout $1.20/M

OpenAI

117B131K ctx

privaterouter/gpt-oss-120b

OpenAI gpt-oss 120B — flagship Apache-2.0 MoE (~5B active). Configurable reasoning effort, native tool use, 131K context.

in $0.40/Mout $1.20/M

Coding

30B33K ctx

privaterouter/qwen3-coder

Qwen3 Coder 30B — fast, accurate code generation across 100+ languages.

in $0.60/Mout $1.80/M

Agentic

35B66K ctx

privaterouter/qwen3.6-35b

Qwen3.6 35B — dense agentic model. Strong reasoning, tool use, and long-form planning. 32K context.

in $0.40/Mout $1.20/M

OpenAI

Coming soon

privaterouter/gpt-oss-20b

OpenAI gpt-oss 20B — Apache-2.0 MoE with native MXFP4. Fast, agentic, tool-use ready. 131K context.

21B · 131K ctx

Final benchmarking in progress. Launching once we lock in the MXFP4 throughput targets.

Frontier

Coming soon

privaterouter/kimi-k2.5

Kimi K2.5 — 1T-parameter MoE flagship (32B active) in NVIDIA's NVFP4 quantization. Self-hosted on Blackwell, lossless vs BF16 on coding & math benchmarks.

1000B · 256K ctx

Reserved on Blackwell GPU capacity (NVFP4 build). Get notified the moment it goes live.

Live catalogPrices in USD per million tokensOpenAI-compatible API

models live

—

requests served (7d)

—

median latency

100%

open-source

Free to start

Try every model with $1 of free credit.

No card required. Sign up takes 30 seconds and you keep your credit forever.

Get $1 free credit Try the chat

What sets us apart

Built for grounded, agentic AI.

Three things every serious AI workload needs — your own data, the live web, and resilient model routing — wired into the same OpenAI-compatible endpoint.

Your library

company-handbook.pdf

47 chunksready

api-contracts.docx

23 chunksready

design-system.md

12 chunksembedding

Q4-roadmap.pdf

31 chunksready

$ curl /v1/chat/completions\
-d '{"plugins":[{"id":"docs"}], ...}'

Encrypted document RAG
that just works.

Drop a PDF, DOCX, Markdown, or TXT file into your library. We chunk it, embed it with nomic-embed, store ciphertext in Postgres + pgvector, and from then on the chat grounds every answer in your own material — with bracket-numbered citations the model is told to cite.

ChaCha20-Poly1305 encrypted at rest with your per-user key
50 MB free, then $0.02/GB/day — pay only for what you store
Per-request doc_ids whitelist for agent workflows

Read the docs

Live web search,
privacy-first.

Enable the Web Search plugin and every chat completion fans out to our self-hosted BitSeek backend, injecting fresh results as a system message before the model thinks. Your queries never go to Google, Bing, or any third-party tracker.

Freshness filter: day · week · month · year
News-mode endpoint for current-events grounding
$0.000005/search — billed only on success

See plugin config

What the model sees

// system: web search results

[1] Anthropic raises $5B at $60B valuation

bloomberg.com · 2h ago

[2] JPMorgan deploys AI tools to 200k staff

reuters.com · 5h ago

[3] Qwen3-32B benchmarks beat Llama 3.1 70B

arxiv.org · 1d ago

Model reply: JPMorgan rolled out Suite AI to 200K employees this week [2], hot on the heels of Anthropic's funding round [1]…

One alias → smart fallback

model: "privaterouter/auto"

routing_tier: "code"

privaterouter/qwen3-coder

primary

privaterouter/deepseek-r1

fallback 1

privaterouter/qwen3-32b

fallback 2

On timeout we walk the chain. Once the upstream returns response headers, we commit — no mid-stream failover.

Smart auto-routing
across the fleet.

Point at privaterouter/auto and we pick the cheapest healthy model that fits the job. Pass routing_tier to nudge the family — code, fast, or quality — and we still handle GPU outages with a per-attempt timeout walk.

Connect-timeout failover before any tokens stream
Per-user policy chain saved in your account
Real-time health checks dictate the order

How routing works

The platform

Everything you need to ship private AI.

Eighteen production features. One self-hosted stack. No closed-model resale, no surprise bills, no training on your prompts.

Document Library (RAG)

Upload PDFs/DOCX/MD/TXT once — encrypted, embedded, indexed. The chat auto-grounds answers in your docs with bracket-numbered citations.

Web Search Grounding

Live web search injected into every chat completion. Self-hosted BitSeek backend — your queries never touch Google or Bing.

Response Healing

JSON-mode auto-repair when the model returns malformed output. Zero retries needed for structured-data pipelines.

OpenAI-Compatible API

Drop-in /v1/chat/completions and /v1/embeddings. Point your SDK at our base URL and ship.

Web Chat Interface

Streaming chat at chat.privaterouter.com with model switching, saved prompts, and history.

Embeddable Widget

A single <script> tag drops a private-AI chat bubble onto any site. Bring your own key.

Model Playground

Compare answers side-by-side across Qwen, Gemma, DeepSeek before wiring up production.

Auto-Router

One model alias, smart routing across our fleet. Optimised for latency, cost, or quality.

Per-Key Analytics

Token spend, latency percentiles, and error rates broken down per API key in real time.

Public Leaderboard

Live rankings of the open models we host — by latency, throughput, and quality.

Saved Prompts

Reusable prompt templates and personas. Version them, share them, ship them.

Privacy Mode

Zero-retention prompts and responses. Logs minimised, never used for training.

OpenAI Migration

5-minute switch guide. Swap base_url, keep your code. Side-by-side benchmark included.

Spend Alerts

Daily / weekly thresholds and webhook fires per key so a runaway agent never bankrupts you.

Bitcoin Payments

On-chain and Lightning top-ups. No card on file. Buy credits, burn them, repeat.

Agent Hosting (PureVPS)

Managed VPS for Hermes, OpenClaw, OpenHands, and browser agents — preconfigured for our API.

Teams & Workspaces

Shared credits, role-based access, and audit trails. Bring your team without billing chaos.

Live Status & Uptime

Real-time GPU node health, latency, and incident history. Subscribe to per-component alerts.

Drop-in compatible

Two lines.
You're live.

We speak the OpenAI dialect. Swap base_url and your existing SDK code, agent framework, or LangChain pipeline just works — against open models we host ourselves.

Streaming, function-calling, JSON mode — all of it, the way you already wrote it.
Works with the official OpenAI SDKs in Python, Node, Go, and the cURL command-line.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.privaterouter.com/v1",
    api_key="pr-...",
)

stream = client.chat.completions.create(
    model="privaterouter/auto",
    messages=[
        {"role": "user", "content": "Explain Bitcoin to a 5-year-old."}
    ],
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

OpenAI SDK compatible

Grab an API key. Ship in two lines.

Swap base_url to https://privaterouter.com/v1 and your existing SDK code, agent framework, or LangChain pipeline just works.

Create API key Read the docs

Embed anywhere

One script tag. Private AI on any site.

Drop the loader into your HTML. Visitors get a private-model chat bubble. You keep full control of the key, the persona, and the spend cap.

paste this into your <head>

<script
  src="https://privaterouter.com/widget/v1.js"
  data-pr-widget="wgt_pr_xxxxxxxxxxxx"
  data-pr-position="bottom-right"
  async></script>

▸ Public widget keys are scoped — they can't drain your account.
▸ Per-domain rate limits and origin pinning enforced server-side.
▸ Set a daily spend cap. The widget shuts off automatically when hit.

Read the widget docs

PrivateRouter

online · open model

live preview

👋 Hey! I'm the PrivateRouter widget — running on an open model, self-hosted. Ask me anything.

What models do you support?

Qwen, Gemma, DeepSeek and a few others — all open weights, all hosted on infrastructure we control. No proxying to OpenAI.

Can I pay with Bitcoin?

Yep. On-chain or Lightning — top up credits, no card required.

Run them anywhere — managed VPS for AI agents

PrivateRouter provides the AI brains. PureVPS provides the body. Together they're an agent.

Featured

Hermes Agent VPS

Run Hermes Agent on a managed VPS — preconfigured for PrivateRouter API

$49.00/mo

+$10.00 in API credits

View plan →

Featured

OpenClaw VPS

Self-hosted OpenClaw coding agent on dedicated compute

$59.00/mo

+$15.00 in API credits

View plan →

Featured

OpenHands VPS

OpenHands autonomous agent — preinstalled & ready

$59.00/mo

+$15.00 in API credits

View plan →

View all VPS plans →

Pay per use — no subscriptions

Pay only for the tokens you use.

Top up once. Burn credits at per-model token rates. No monthly bill, no auto-renewals, no surprise charges — credits never expire. Start with $1 of free credit, no card required.

Free tier

$1 credit on signup · all 8 models · 20 RPM

No card required. Test every model end-to-end before you spend a cent.

Start free

Indie

$10credit

Kick the tires

Top up

Builder

$25credit

Ship a side project

Top up

Most chosen

Team

$50credit

Run agents for weeks

Top up

Scale

$100credit

Production workloads

Top up

Enterprise

$500credit

Volume discount

Top up

Credits never expire

Hard spend caps stop runaway agents

All 8 models, same price for everyone

See per-model token pricing

Browser-based AI chat — no install

A full ChatGPT-style chat, private and self-hosted.

Streaming responses, model switching, saved prompts, conversation history. All running against PrivateRouter's open models — no API key required to test, no data sent to OpenAI or Anthropic.

privaterouter.com/chat

Demo

New conversation

qwen3-coder · 30B▾

Write a Python function that reverses a linked list iteratively.

Send a message…

Open chat

Streaming responsesSwitch models mid-threadConversation historyPrivacy mode

Try the chat now Compare models side-by-side

Free $1 credit on signup · No card required · All 8 models unlocked

Why PrivateRouter

Built for builders who need fast, honest, private inference.

Self-hosted open models

Qwen, Gemma, DeepSeek — running on infrastructure we control.

No closed-model resale

We do not proxy OpenAI or Anthropic. Open weights only.

Predictable pricing

Transparent per-token rates with no surprise overage fees.

Agent-ready APIs

OpenAI-compatible endpoints. Drop-in for your existing tooling.

Privacy-first infrastructure

Prompts are not retained for training. Logs minimized by default.

PureVPS agent hosting

Co-locate your agents next to the models. Sub-millisecond hops.

Frequently asked

Questions, answered.

What is PrivateRouter?

PrivateRouter is a privacy-first AI inference platform. We host open-weight models (Qwen, DeepSeek, Llama, Gemma, Nomic-Embed) on our own GPUs and expose them through an OpenAI-compatible API, a streaming web chat, and an embeddable widget. There's no closed-model resale — we're not a wrapper over OpenAI or Anthropic.

Is PrivateRouter OpenAI-compatible?

Yes. Our /v1/chat/completions and /v1/embeddings endpoints accept the standard OpenAI request shape. To migrate, change your SDK's base_url to https://privaterouter.com/v1 and swap your API key for an sk-pr-... key. Existing OpenAI SDK code, agent frameworks, and LangChain pipelines work without changes.

Which models can I use?

Every account gets access to every model — all 8 of them: Qwen3 (8B / 32B), Qwen3 Coder 30B, DeepSeek-R1, Gemma3 12B, Gemma4, Llama 3.1 8B (128K context), and Nomic-Embed-Text for embeddings. Same catalog whether you're on the free tier or topping up $500 at a time.

How does pricing work?

Pure pay-per-use, like OpenRouter. Per-token pricing in USD per million tokens, billed against prepaid credits. No subscriptions, no monthly fees. Sign up gets you $1 of free credit; top up from $10 to $500 when you're ready. Credits never expire. There are no overage charges — your spend caps at your remaining balance.

Do you log my prompts?

By default we keep minimal request metadata (timestamps, token counts, latency) to bill accurately and serve our public leaderboard. Enable Privacy Mode on a per-key basis for zero-retention prompts and responses. We never train on customer prompts.

Can I pay with Bitcoin?

Yes — on-chain Bitcoin top-ups via our native payment module. Buy credit packs from $5 to $500, no card required. Stripe and credit cards also supported for one-time top-ups.

Where are the models hosted?

On rented GPU nodes (currently Vast.ai consumer/datacenter GPUs). The autoscaler matches capacity to demand — cheaper consumer GPUs at low volume, scaling up to enterprise GPUs as load grows. No data leaves our infrastructure to closed-model providers.

Can I run agents on PrivateRouter?

Yes. Our PureVPS Agent Hosting packages come preconfigured for Hermes, OpenClaw, OpenHands, and browser agents — with PrivateRouter API credits bundled in. Or use our API directly from your own infrastructure.

Is there a free trial?

Yes. Sign up gets you $1 of API credit and access to every model — enough to try them all end-to-end with no card on file. Top up only when you're ready.

Ship private AI today.

Create your free account Try the web chat

Read the docs5-min migration from OpenAIBrowse model benchmarksGitHub

Private AI Routingfor Builders.

One catalog. 5 open models live, 2 more incoming.

Try every model with $1 of free credit.

Built for grounded, agentic AI.

Encrypted document RAGthat just works.

Live web search,privacy-first.

Smart auto-routingacross the fleet.

Everything you need to ship private AI.

Document Library (RAG)

Web Search Grounding

Response Healing

OpenAI-Compatible API

Web Chat Interface

Embeddable Widget

Model Playground

Auto-Router

Per-Key Analytics

Public Leaderboard

Saved Prompts

Privacy Mode

OpenAI Migration

Spend Alerts

Bitcoin Payments

Agent Hosting (PureVPS)

Teams & Workspaces

Live Status & Uptime

Two lines.You're live.

Grab an API key. Ship in two lines.

One script tag. Private AI on any site.

Run them anywhere — managed VPS for AI agents

Pay only for the tokens you use.

A full ChatGPT-style chat, private and self-hosted.

Why PrivateRouter

Self-hosted open models

No closed-model resale

Predictable pricing

Agent-ready APIs

Privacy-first infrastructure

PureVPS agent hosting

Questions, answered.

Ship private AI today.

Private AI Routing
for Builders.

Encrypted document RAG
that just works.

Live web search,
privacy-first.

Smart auto-routing
across the fleet.

Two lines.
You're live.