Differences
This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
| plexus [2026/04/10 21:57] – created matt_cowger.us | plexus [2026/04/10 23:06] (current) – xenolandscapes | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | Test | + | ====== Plexus ====== |
| + | |||
| + | [[https:// | ||
| + | |||
| + | A unified LLM API gateway handling protocol translation, | ||
| + | |||
| + | Currently the most practical way to use multiple AI providers through a single endpoint without rewriting client code per API format. Routes OpenAI, Anthropic, Gemini, and any OpenAI-compatible provider through one interface. | ||
| + | |||
| + | Also supports OAuth-backed providers (GitHub Copilot, Claude, Codex, Gemini CLI) without API key management. Currently the only gateway with built-in vision fallthrough for non-vision models. | ||
| + | |||
| + | Pros: | ||
| + | |||
| + | * Routes between providers automatically with configurable selectors. | ||
| + | * Converts API formats bidirectionally — send Anthropic-style requests to OpenAI backends and vice versa. | ||
| + | * Vision fallthrough lets cheap models handle images via automatic text description. | ||
| + | * Per-request cost tracking with quota enforcement by tokens, requests, or dollars. Web UI for configuration. | ||
| + | * Best available conversion accuracy - exceeds the translation validity of LiteLLM, AxonHub and other similar tools. | ||
| + | |||
| + | Cons: | ||
| + | |||
| + | * Adds ~20-50ms latency vs direct provider calls. | ||
| + | * Self-hosted means you manage uptime. | ||
| + | * SQLite default won't scale past single-node; | ||
| + | |||
| + | |||
| + | **Vision Fallthrough** | ||
| + | |||
| + | * Unique capability allowing vision aliases to work with non-vision backends. | ||
| + | * Intercepts images, sends to a descriptor model (Gemini Flash, GPT-5.3-Codex) for text conversion, passes descriptions to target. | ||
| + | * Enables specialized models to handle image inputs transparently without native vision support. | ||
| + | |||
| + | **Routing & Selection** | ||
| + | |||
| + | Model aliases backed by multiple providers: | ||
| + | |||
| + | | Selector | Behavior | | ||
| + | |----------|----------| | ||
| + | | '' | ||
| + | | '' | ||
| + | | '' | ||
| + | | '' | ||
| + | | '' | ||
| + | |||
| + | '' | ||
| + | |||
| + | **Protocol Translation** | ||
| + | |||
| + | Bidirectional conversion between: | ||
| + | - OpenAI chat completions (''/ | ||
| + | - OpenAI responses (''/ | ||
| + | - Full OpenAI ''/ | ||
| + | - Anthropic messages (''/ | ||
| + | - Google Gemini native format (''/ | ||
| + | |||
| + | Handles streaming SSE and tool use normalization across all formats. | ||
| + | |||
| + | **Deep Inspection** | ||
| + | |||
| + | * Per-request capture of full request/ | ||
| + | * Every request gets a UUID; trace the full lifecycle from ingress to provider response. | ||
| + | * Stores 30 days of debug logs by default (configurable). | ||
| + | * View timing waterfall: parse → route → transform → provider roundtrip → transform → serialize. | ||
| + | * See exactly which provider handled the request and why. | ||
| + | |||
| + | Pros: | ||
| + | * Complete visibility into routing decisions and transformation errors. | ||
| + | * Request/ | ||
| + | * Performance metrics (TTFT, TPS, total latency) per provider/ | ||
| + | * Full error context including stack traces when transformations fail. | ||
| + | |||
| + | Cons: | ||
| + | * Capturing full bodies increases storage significantly — ~10-50KB per request depending on payload size. | ||
| + | * Default SQLite deployment will grow quickly under high load; PostgreSQL recommended for inspection-heavy use. | ||
| + | |||
| + | **Quota Enforcement** | ||
| + | |||
| + | * Per-API-key limits: `tokens`, `requests`, or `cost`. | ||
| + | * Windows: rolling, daily, weekly. | ||
| + | * Hard stops when exceeded. | ||
| + | |||
| + | **Provider Cooldowns** | ||
| + | |||
| + | * Automatic exponential backoff on failure: 2 min → 4 min → 8 min → ... → 5 hr cap. | ||
| + | * Automatic parsing of quota and Retry-After headers to set cooldowns. | ||
| + | * Successful requests reset counter. | ||
| + | * Optionally disable per-provider. | ||
| + | |||
| + | **MCP Proxy** | ||
| + | |||
| + | Model Context Protocol server proxy with per-request session isolation. | ||
| + | * HTTP transport only; prevents tool sprawl across clients. | ||
| + | |||
| + | **Encryption at Rest** | ||
| + | |||
| + | * Optional AES-256-GCM for API keys, OAuth tokens, MCP headers. | ||
| + | * Generate once: `openssl rand -hex 32`. Automatic migration from plaintext on first boot with key set. | ||
| + | |||