Plexus
A unified LLM API gateway handling protocol translation, failover, and usage tracking. Maintained by Matt Cowger (@mcowger on the Synthetic Discord). The Synthetic Discord has a #plexus channel thread dedicated to Plexus support.
Currently the most practical way to use multiple AI providers through a single endpoint without rewriting client code per API format. Routes OpenAI, Anthropic, Gemini, and any OpenAI-compatible provider through one interface.
Also supports OAuth-backed providers (GitHub Copilot, Claude, Codex, Gemini CLI) without API key management. Currently the only gateway with built-in vision fallthrough for non-vision models.
Pros:
- Routes between providers automatically with configurable selectors.
- Converts API formats bidirectionally — send Anthropic-style requests to OpenAI backends and vice versa.
- Vision fallthrough lets cheap models handle images via automatic text description.
- Per-request cost tracking with quota enforcement by tokens, requests, or dollars. Web UI for configuration.
- Best available conversion accuracy - exceeds the translation validity of LiteLLM, AxonHub and other similar tools.
Cons:
- Adds ~20-50ms latency vs direct provider calls.
- Self-hosted means you manage uptime.
- SQLite default won’t scale past single-node; PostgreSQL required for production.
Vision Fallthrough
- Unique capability allowing vision aliases to work with non-vision backends.
- Intercepts images, sends to a descriptor model (Gemini Flash, GPT-5.3-Codex) for text conversion, passes descriptions to target.
- Enables specialized models to handle image inputs transparently without native vision support.
Routing & Selection
Model aliases backed by multiple providers:
| Selector | Behavior |
| ———- | ———- |
random | Distribute across healthy targets (default) |
in_order | Failover sequence |
cost | Cheapest provider wins |
performance | Highest tokens/sec |
latency | Lowest time-to-first-token |
priority: api_match enables pass-through for same-format providers, skipping transformation overhead.
Protocol Translation
Bidirectional conversion between:
- OpenAI chat completions (
/v1/chat/completions) - OpenAI responses (
/v1/responses) - Full OpenAI
/v1/responsessupport: stateful multi-turn viaprevious_response_id, 7-day TTL storage, function calling. - Anthropic messages (
/v1/messages) - Google Gemini native format (
/v1beta)
Handles streaming SSE and tool use normalization across all formats.
Deep Inspection
- Per-request capture of full request/response bodies, transformation steps, routing decisions, and timing breakdowns.
- Every request gets a UUID; trace the full lifecycle from ingress to provider response.
- Stores 30 days of debug logs by default (configurable).
- View timing waterfall: parse → route → transform → provider roundtrip → transform → serialize.
- See exactly which provider handled the request and why.
Pros:
- Complete visibility into routing decisions and transformation errors.
- Request/response bodies aid debugging client issues without reproducing.
- Performance metrics (TTFT, TPS, total latency) per provider/model help identify regressions.
- Full error context including stack traces when transformations fail.
Cons:
- Capturing full bodies increases storage significantly — ~10-50KB per request depending on payload size.
- Default SQLite deployment will grow quickly under high load; PostgreSQL recommended for inspection-heavy use.
Quota Enforcement
- Per-API-key limits: `tokens`, `requests`, or `cost`.
- Windows: rolling, daily, weekly.
- Hard stops when exceeded.
Provider Cooldowns
- Automatic exponential backoff on failure: 2 min → 4 min → 8 min → … → 5 hr cap.
- Automatic parsing of quota and Retry-After headers to set cooldowns.
- Successful requests reset counter.
- Optionally disable per-provider.
MCP Proxy
Model Context Protocol server proxy with per-request session isolation.
- HTTP transport only; prevents tool sprawl across clients.
Encryption at Rest
- Optional AES-256-GCM for API keys, OAuth tokens, MCP headers.
- Generate once: `openssl rand -hex 32`. Automatic migration from plaintext on first boot with key set.