hf:moonshotai/Kimi-K2.5
Price: $0.45/mtok in, $3.40/mtok out
hf:moonshotai/Kimi-K2.5 and hf:nvidia/Kimi-K2.5-NVFP4 are aliased internally and can be used interchangeably.
A powerful agentic model with above-average lateral thinking/debugging and great design skills.
Although Kimi was trained with Agent Swarms, to get the same results as MoonshotAI you would have to use their proprietary swarms endpoint which is not available on Synthetic. However, similar results may be had using massively parallel sub-agents, or utilizing an SDK such as Swarms with various roles.
Since February 15, 2026, Synthetic routes Kimi K2.5 requests between two hardware backends based on current load. Both model strings (hf:moonshotai/Kimi-K2.5 and hf:nvidia/Kimi-K2.5-NVFP4) may silently hit either backend. You don’t need to do anything differently.
Why: B200 capacity was sometimes overloaded while H200s sat with excess capacity. Routing between them based on load smooths this out and lets Synthetic scale Kimi via B200s going forward. Some INT4 capacity remains on reserved H200s (NVFP4 provides no perf advantage on H200 hardware).
Benchmark comparison (from Synthetic’s internal testing):
| Benchmark | INT4 | NVFP4 | Delta |
|---|---|---|---|
| AIME | 91.0 | 93.3 | NVFP4 +3.3 |
| Aider Polyglot | 74.4 | 71.1 | INT4 +3.3 |
| LiveCodeBench (subset) | — | — | NVFP4 +4.0 |
All within margin of error. NVFP4 mildly better on 2 of 3 benchmarks. Synthetic considers them equivalent for routing purposes.
Since April 7, 2026, Synthetic serves Kimi K2.5 from a mix of NVFP4 (B200s) and original INT4 (H200s). The /models API endpoint reports NVFP4 as the quant format, but your request may silently hit either backend based on current load.
When serving a model in multiple precisions, Synthetic reports the “least legitimate” (i.e., lab-released original) format in the /models endpoint. Since Moonshot’s original release was INT4 and Nvidia’s NVFP4 is a derived quant, NVFP4 is reported. However, INT4 capacity remains on reserved H200s where NVFP4 provides no performance advantage.
This means: even if you explicitly use hf:nvidia/Kimi-K2.5-NVFP4, your request may still be served by the INT4 variant on H200s during times of B200 load.