Differences
This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
| models:kimi-k25 [2026/04/11 14:41] – created kat | models:kimi-k25 [2026/04/19 22:16] (current) – Added NVFP4/INT4 precision mix detail gwyntel | ||
|---|---|---|---|
| Line 16: | Line 16: | ||
| Although Kimi was trained with Agent Swarms, to get the same results as MoonshotAI you would have to use their proprietary swarms endpoint which is not available on Synthetic. However, similar results may be had using massively parallel sub-agents, or utilizing an SDK such as [[Swarms]] with various roles. | Although Kimi was trained with Agent Swarms, to get the same results as MoonshotAI you would have to use their proprietary swarms endpoint which is not available on Synthetic. However, similar results may be had using massively parallel sub-agents, or utilizing an SDK such as [[Swarms]] with various roles. | ||
| + | |||
| + | === Load-Balanced Routing (NVFP4 + INT4) === | ||
| + | |||
| + | <WRAP center round tip> | ||
| + | Since February 15, 2026, Synthetic routes Kimi K2.5 requests between two hardware backends based on current load. Both model strings ('' | ||
| + | </ | ||
| + | |||
| + | - **INT4 variant** — runs on H200 GPUs (original Moonshot quant) | ||
| + | - **NVFP4 variant** — runs on B200 GPUs (Nvidia' | ||
| + | |||
| + | Why: B200 capacity was sometimes overloaded while H200s sat with excess capacity. Routing between them based on load smooths this out and lets Synthetic scale Kimi via B200s going forward. Some INT4 capacity remains on reserved H200s (NVFP4 provides no perf advantage on H200 hardware). | ||
| + | |||
| + | **Benchmark comparison** (from Synthetic' | ||
| + | |||
| + | ^ Benchmark ^ INT4 ^ NVFP4 ^ Delta ^ | ||
| + | | AIME | 91.0 | 93.3 | NVFP4 +3.3 | | ||
| + | | Aider Polyglot | 74.4 | 71.1 | INT4 +3.3 | | ||
| + | | LiveCodeBench (subset) | — | — | NVFP4 +4.0 | | ||
| + | |||
| + | All within margin of error. NVFP4 mildly better on 2 of 3 benchmarks. Synthetic considers them equivalent for routing purposes. | ||
| + | |||
| + | |||
| + | === Precision Mix Detail === | ||
| + | |||
| + | <WRAP center round info 60%> | ||
| + | Since April 7, 2026, Synthetic serves Kimi K2.5 from a **mix of NVFP4 (B200s) and original INT4 (H200s)**. The ''/ | ||
| + | </ | ||
| + | |||
| + | When serving a model in multiple precisions, Synthetic reports the "least legitimate" | ||
| + | |||
| + | This means: even if you explicitly use '' | ||