Organic

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
plexus [2026/04/10 21:57] – created matt_cowger.usplexus [2026/04/10 23:06] (current) xenolandscapes
Line 1: Line 1:
-Test +====== Plexus ====== 
 + 
 +[[https://github.com/mcowger/plexus|Plexus]] 
 + 
 +A unified LLM API gateway handling protocol translation, failover, and usage tracking. Maintained by Matt Cowger (@mcowger on the Synthetic Discord). The Synthetic Discord has a [[https://discord.com/channels/1315627714056687706/1446759141921132584|#plexus]] channel thread dedicated to Plexus support. 
 + 
 +Currently the most practical way to use multiple AI providers through a single endpoint without rewriting client code per API format. Routes OpenAI, Anthropic, Gemini, and any OpenAI-compatible provider through one interface. 
 + 
 +Also supports OAuth-backed providers (GitHub Copilot, Claude, Codex, Gemini CLI) without API key management. Currently the only gateway with built-in vision fallthrough for non-vision models. 
 + 
 +Pros:  
 + 
 +  * Routes between providers automatically with configurable selectors.  
 +  * Converts API formats bidirectionally — send Anthropic-style requests to OpenAI backends and vice versa.  
 +  * Vision fallthrough lets cheap models handle images via automatic text description.  
 +  * Per-request cost tracking with quota enforcement by tokens, requests, or dollars. Web UI for configuration. 
 +  * Best available conversion accuracy - exceeds the translation validity of LiteLLM, AxonHub and other similar tools. 
 + 
 +Cons:  
 + 
 +  * Adds ~20-50ms latency vs direct provider calls.  
 +  * Self-hosted means you manage uptime.  
 +  * SQLite default won't scale past single-node; PostgreSQL required for production.  
 + 
 + 
 +**Vision Fallthrough** 
 + 
 +  * Unique capability allowing vision aliases to work with non-vision backends.  
 +  * Intercepts images, sends to a descriptor model (Gemini Flash, GPT-5.3-Codex) for text conversion, passes descriptions to target. 
 +  * Enables specialized models to handle image inputs transparently without native vision support. 
 + 
 +**Routing & Selection** 
 + 
 +Model aliases backed by multiple providers: 
 + 
 +| Selector | Behavior | 
 +|----------|----------| 
 +| ''random'' | Distribute across healthy targets (default) | 
 +| ''in_order'' | Failover sequence | 
 +| ''cost'' | Cheapest provider wins | 
 +| ''performance'' | Highest tokens/sec | 
 +| ''latency'' | Lowest time-to-first-token | 
 + 
 +''priority: api_match'' enables pass-through for same-format providers, skipping transformation overhead. 
 + 
 +**Protocol Translation** 
 + 
 +Bidirectional conversion between: 
 +  - OpenAI chat completions (''/v1/chat/completions''
 +  - OpenAI responses (''/v1/responses''
 + - Full OpenAI ''/v1/responses'' support: stateful multi-turn via ''previous_response_id'', 7-day TTL storage, function calling. 
 +  - Anthropic messages (''/v1/messages''
 +  - Google Gemini native format (''/v1beta''
 + 
 +Handles streaming SSE and tool use normalization across all formats. 
 + 
 +**Deep Inspection** 
 + 
 +  * Per-request capture of full request/response bodies, transformation steps, routing decisions, and timing breakdowns.  
 +  * Every request gets a UUID; trace the full lifecycle from ingress to provider response. 
 +  * Stores 30 days of debug logs by default (configurable).  
 +  * View timing waterfall: parse → route → transform → provider roundtrip → transform → serialize.  
 +    * See exactly which provider handled the request and why. 
 + 
 +Pros:  
 +  * Complete visibility into routing decisions and transformation errors.  
 +  * Request/response bodies aid debugging client issues without reproducing.  
 +  * Performance metrics (TTFT, TPS, total latency) per provider/model help identify regressions. 
 +  * Full error context including stack traces when transformations fail. 
 + 
 +Cons:  
 +  * Capturing full bodies increases storage significantly — ~10-50KB per request depending on payload size.  
 +  * Default SQLite deployment will grow quickly under high load; PostgreSQL recommended for inspection-heavy use.  
 + 
 +**Quota Enforcement** 
 + 
 +  * Per-API-key limits: `tokens`, `requests`, or `cost`.  
 +  * Windows: rolling, daily, weekly.  
 +  * Hard stops when exceeded. 
 + 
 +**Provider Cooldowns** 
 + 
 +  * Automatic exponential backoff on failure: 2 min → 4 min → 8 min → ... → 5 hr cap. 
 +  * Automatic parsing of quota and Retry-After headers to set cooldowns. 
 +  * Successful requests reset counter. 
 +  * Optionally disable per-provider. 
 + 
 +**MCP Proxy** 
 + 
 +Model Context Protocol server proxy with per-request session isolation.  
 +  * HTTP transport only; prevents tool sprawl across clients. 
 + 
 +**Encryption at Rest** 
 + 
 +  * Optional AES-256-GCM for API keys, OAuth tokens, MCP headers.  
 +  * Generate once: `openssl rand -hex 32`. Automatic migration from plaintext on first boot with key set. 
 +