Plugins

PrivateRouter plugins are opt-in features that extend chat completions without changing your model code. Manage them at /settings/plugins in the dashboard.

All plugins are per-user: enabling one applies to every API key on the account, every web-chat conversation, and every embedded widget. They are free to use — you only pay for the underlying model tokens.

Available plugins

Plugin	Status	What it does
Web Search	Enabled	Augment responses with live web search results via BitSeek; queries stay on PrivateRouter.
Document Library	Enabled	Ground answers against your uploaded PDFs/DOCX/MD/TXT. See Document Library.
PDF Inputs	Enabled	Attach PDFs to chat messages — text + structure extracted server-side so every model can read them.
Response Healing	Enabled	Auto-repair malformed JSON when you set `response_format={"type":"json_object"}`.
Pareto Router	Enabled	Task-aware quality tiers for `privaterouter/auto`. Per-request override via `routing_tier`.

Coming Soon plugins render in the UI as locked toggles. The API rejects any attempt to enable them via curl with a 422 — "This plugin is coming soon — enabling is not yet available." We'll flip them on account-wide when the backing implementation ships.

Response Healing

When you ask any model for structured output via response_format={"type":"json_object"}, the model may return text that looks like JSON but has trailing commas, unquoted keys, or stray characters around the object. Response Healing detects this server-side and runs a fast structural repair pass before the response reaches your client.

Strategy jsonrepair (default): zero extra LLM cost. Repairs the output with the json-repair library — handles trailing commas, single quotes, unquoted keys, truncated objects, code-fence wrappers.
Strategy llm_retry: re-prompts the same model with the broken output and a "fix this JSON" instruction. Slower, deeper. Costs one additional completion at the same per-token rate.

Healing only fires when:

The plugin is enabled for your account.
The request body has response_format.type == "json_object".
The model's reply is non-empty and fails JSON.parse.

Healing never raises — a healing bug falls through to the model's original output so your completions can't be broken by it.

Streaming

Healing also works with "stream": true. When you ask for both streaming AND response_format=json_object, PrivateRouter transparently routes the call through the non-streaming path under the hood (so the healer can see the full output), then SSE-wraps the final healed response into a single chat.completion.chunk + usage chunk + [DONE]. This is invisible to your OpenAI SDK — clients consume the stream normally and get valid JSON. The X-PrivateRouter-Pseudo-Stream: 1 response header tells you it happened, which is handy when debugging.

This trade-off is intentional: progressive deltas don't help JSON-mode clients (they parse the whole object), and routing through the non-streaming path is the only way to give the healer a complete payload before the client sees it. If you don't need healing for a specific streaming call, just omit response_format.

Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.privaterouter.com/v1",
    api_key="sk-pr-...",
)

resp = client.chat.completions.create(
    model="privaterouter/qwen3",
    response_format={"type": "json_object"},
    messages=[
        {"role": "user", "content": "Reply with a JSON object: {name, age}"},
    ],
)

import json
data = json.loads(resp.choices[0].message.content)  # always parses

Pareto Router

privaterouter/auto normally picks the cheapest healthy model on your plan. Pareto Router adds tier hints so you can ask for the right kind of model without naming one directly. Three tiers:

Tier	Preference order	Use it for
`fast`	Falls back to cheapest healthy chat model.	Casual chat, bulk classification, short prompts.
`code`	`qwen3-coder` → `deepseek-coder` → `coder*`	Programming tasks, code review, refactors.
`quality`	`deepseek-r1` → `qwen3-32b` → `gpt-oss-120b`	Deep reasoning, long context, hard analysis.

Tier resolution always checks health and plan access — if your preferred-family model is unhealthy or not on your plan, the router falls through to the next preference, and finally to the cheapest healthy model so requests never fail because of a missing tier.

Per-request override

Pass routing_tier as a top-level body field. It's stripped before the request reaches the upstream model — your model code stays clean.

curl https://api.privaterouter.com/v1/chat/completions \
  -H "Authorization: Bearer $PRIVA...KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "privaterouter/auto",
    "routing_tier": "code",
    "messages": [
      {"role": "user", "content": "Refactor this Python function for readability."}
    ]
  }'

The response's model field reflects the model that actually served the request — useful for billing reconciliation:

{
  "model": "privaterouter/qwen3-coder",
  "choices": [...],
  "usage": {...}
}

Saved default

If you don't pass routing_tier, the router uses the Default tier saved on your Plugins page. The default applies to:

API calls that target privaterouter/auto
Web chat (when the chat UI selects privaterouter/auto)
Embedded widgets that use privaterouter/auto

It does not override an explicit model name — if you ask for privaterouter/qwen3-coder, you get privaterouter/qwen3-coder regardless of tier.

Composing with auto-routing fallback

Pareto Router and the auto-router's fallback chain are complementary. The resolved tier model is prepended to your existing fallback chain (deduplicated), so:

If tier = code and your fallback chain is [qwen3, gemma3], the router tries qwen3-coder → qwen3 → gemma3 on per-attempt timeout.
If the tier-preferred model is unhealthy, the router still walks the whole chain. You won't see a 503 because of a tier miss.

API reference

`GET /api/plugins`

Returns the caller's plugin settings. Auto-creates a row on first read with safe defaults (response_healing + pareto_router ON, others OFF).

curl https://api.privaterouter.com/api/plugins \
  -H "Authorization: Bearer $PRIVA...KEY"

{
  "web_search_enabled": false,
  "web_search_coming_soon": true,
  "pdf_inputs_enabled": false,
  "pdf_inputs_coming_soon": true,
  "response_healing_enabled": true,
  "response_healing_config": {"strategy": "jsonrepair"},
  "response_healing_coming_soon": false,
  "pareto_router_enabled": true,
  "pareto_router_config": {"default_tier": "fast"},
  "pareto_router_coming_soon": false
}

`PUT /api/plugins`

Partial update. Only fields you include are touched; omitted fields keep their current value. Per-plugin *_config objects are replaced wholesale when present.

curl -X PUT https://api.privaterouter.com/api/plugins \
  -H "Authorization: Bearer $PRIVA...KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "pareto_router_config": {"default_tier": "code"},
    "response_healing_config": {"strategy": "jsonrepair"}
  }'

Returns the full settings shape (same as GET) so the UI can reconcile optimistic updates without a second round-trip.

FAQ

Will I be charged for healing? No. The jsonrepair strategy runs locally on PrivateRouter — zero token spend. The llm_retry strategy costs one additional completion at the upstream model's per-token rate.

Can I disable Pareto for a single key? Not yet. Per-key plugin overrides are on the roadmap. Today, plugin settings are account-wide.

Does Pareto affect non-privaterouter/auto requests? No. Explicit model names always win. Pareto only modifies the chain when the requested model is privaterouter/auto.

Why does the response model field differ from the request? That's intentional — you get the model that actually served the request, which may differ from the sentinel (privaterouter/auto) or from a healthy fallback if your tier-preferred model was unavailable. Bill against the response model, not the request.