Plugins
PrivateRouter plugins are opt-in features that extend chat completions without changing your model code. Manage them at /settings/plugins in the dashboard.
All plugins are per-user: enabling one applies to every API key on the account, every web-chat conversation, and every embedded widget. They are free to use — you only pay for the underlying model tokens.
Available plugins
| Plugin | Status | What it does |
|---|---|---|
| Web Search | Enabled | Augment responses with live web search results via BitSeek; queries stay on PrivateRouter. |
| Document Library | Enabled | Ground answers against your uploaded PDFs/DOCX/MD/TXT. See Document Library. |
| PDF Inputs | Enabled | Attach PDFs to chat messages — text + structure extracted server-side so every model can read them. |
| Response Healing | Enabled | Auto-repair malformed JSON when you set response_format={"type":"json_object"}. |
| Pareto Router | Enabled | Task-aware quality tiers for privaterouter/auto. Per-request override via routing_tier. |
Coming Soon plugins render in the UI as locked toggles. The API rejects any attempt to enable them via curl with a
422—"This plugin is coming soon — enabling is not yet available."We'll flip them on account-wide when the backing implementation ships.
Response Healing
When you ask any model for structured output via
response_format={"type":"json_object"}, the model may return text that
looks like JSON but has trailing commas, unquoted keys, or stray
characters around the object. Response Healing detects this server-side
and runs a fast structural repair pass before the response reaches
your client.
- Strategy
jsonrepair(default): zero extra LLM cost. Repairs the output with thejson-repairlibrary — handles trailing commas, single quotes, unquoted keys, truncated objects, code-fence wrappers. - Strategy
llm_retry: re-prompts the same model with the broken output and a "fix this JSON" instruction. Slower, deeper. Costs one additional completion at the same per-token rate.
Healing only fires when:
- The plugin is enabled for your account.
- The request body has
response_format.type == "json_object". - The model's reply is non-empty and fails
JSON.parse.
Healing never raises — a healing bug falls through to the model's original output so your completions can't be broken by it.
Streaming
Healing also works with "stream": true. When you ask for both
streaming AND response_format=json_object, PrivateRouter
transparently routes the call through the non-streaming path under the
hood (so the healer can see the full output), then SSE-wraps the final
healed response into a single chat.completion.chunk + usage chunk +
[DONE]. This is invisible to your OpenAI SDK — clients consume the
stream normally and get valid JSON. The X-PrivateRouter-Pseudo-Stream: 1 response header tells you it happened, which is handy when
debugging.
This trade-off is intentional: progressive deltas don't help JSON-mode
clients (they parse the whole object), and routing through the
non-streaming path is the only way to give the healer a complete
payload before the client sees it. If you don't need healing for a
specific streaming call, just omit response_format.
Example
from openai import OpenAI
client = OpenAI(
base_url="https://api.privaterouter.com/v1",
api_key="sk-pr-...",
)
resp = client.chat.completions.create(
model="privaterouter/qwen3",
response_format={"type": "json_object"},
messages=[
{"role": "user", "content": "Reply with a JSON object: {name, age}"},
],
)
import json
data = json.loads(resp.choices[0].message.content) # always parses
Pareto Router
privaterouter/auto normally picks the cheapest healthy model on your
plan. Pareto Router adds tier hints so you can ask for the right
kind of model without naming one directly. Three tiers:
| Tier | Preference order | Use it for |
|---|---|---|
fast | Falls back to cheapest healthy chat model. | Casual chat, bulk classification, short prompts. |
code | qwen3-coder → deepseek-coder → coder* | Programming tasks, code review, refactors. |
quality | deepseek-r1 → qwen3-32b → gpt-oss-120b | Deep reasoning, long context, hard analysis. |
Tier resolution always checks health and plan access — if your preferred-family model is unhealthy or not on your plan, the router falls through to the next preference, and finally to the cheapest healthy model so requests never fail because of a missing tier.
Per-request override
Pass routing_tier as a top-level body field. It's stripped before
the request reaches the upstream model — your model code stays clean.
curl https://api.privaterouter.com/v1/chat/completions \
-H "Authorization: Bearer $PRIVA...KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "privaterouter/auto",
"routing_tier": "code",
"messages": [
{"role": "user", "content": "Refactor this Python function for readability."}
]
}'
The response's model field reflects the model that actually served the
request — useful for billing reconciliation:
{
"model": "privaterouter/qwen3-coder",
"choices": [...],
"usage": {...}
}
Saved default
If you don't pass routing_tier, the router uses the Default tier
saved on your Plugins page. The default applies to:
- API calls that target
privaterouter/auto - Web chat (when the chat UI selects
privaterouter/auto) - Embedded widgets that use
privaterouter/auto
It does not override an explicit model name — if you ask for
privaterouter/qwen3-coder, you get privaterouter/qwen3-coder regardless
of tier.
Composing with auto-routing fallback
Pareto Router and the auto-router's fallback chain are complementary. The resolved tier model is prepended to your existing fallback chain (deduplicated), so:
- If tier =
codeand your fallback chain is[qwen3, gemma3], the router triesqwen3-coder → qwen3 → gemma3on per-attempt timeout. - If the tier-preferred model is unhealthy, the router still walks the
whole chain. You won't see a
503because of a tier miss.
API reference
GET /api/plugins
Returns the caller's plugin settings. Auto-creates a row on first read
with safe defaults (response_healing + pareto_router ON, others OFF).
curl https://api.privaterouter.com/api/plugins \
-H "Authorization: Bearer $PRIVA...KEY"
{
"web_search_enabled": false,
"web_search_coming_soon": true,
"pdf_inputs_enabled": false,
"pdf_inputs_coming_soon": true,
"response_healing_enabled": true,
"response_healing_config": {"strategy": "jsonrepair"},
"response_healing_coming_soon": false,
"pareto_router_enabled": true,
"pareto_router_config": {"default_tier": "fast"},
"pareto_router_coming_soon": false
}
PUT /api/plugins
Partial update. Only fields you include are touched; omitted fields keep
their current value. Per-plugin *_config objects are replaced wholesale
when present.
curl -X PUT https://api.privaterouter.com/api/plugins \
-H "Authorization: Bearer $PRIVA...KEY" \
-H "Content-Type: application/json" \
-d '{
"pareto_router_config": {"default_tier": "code"},
"response_healing_config": {"strategy": "jsonrepair"}
}'
Returns the full settings shape (same as GET) so the UI can reconcile optimistic updates without a second round-trip.
FAQ
Will I be charged for healing? No. The jsonrepair strategy runs
locally on PrivateRouter — zero token spend. The llm_retry strategy
costs one additional completion at the upstream model's per-token rate.
Can I disable Pareto for a single key? Not yet. Per-key plugin overrides are on the roadmap. Today, plugin settings are account-wide.
Does Pareto affect non-privaterouter/auto requests? No. Explicit
model names always win. Pareto only modifies the chain when the
requested model is privaterouter/auto.
Why does the response model field differ from the request? That's
intentional — you get the model that actually served the request,
which may differ from the sentinel (privaterouter/auto) or from a
healthy fallback if your tier-preferred model was unavailable. Bill
against the response model, not the request.