models:kimi-k25 [Organic]

This is an old revision of the document!

Kimi K2.5

hf:moonshotai/Kimi-K2.5

Price: $0.45/mtok in, $3.40/mtok out

hf:moonshotai/Kimi-K2.5 and hf:nvidia/Kimi-K2.5-NVFP4 are aliased internally and can be used interchangeably.

A powerful agentic model with above-average lateral thinking/debugging and great design skills.

Pros: Solid code. Amazing at orchestrating other agents due to special “agent swarm” reinforcement learning (source). Only frontier class model on Synthetic with vision. Best model Synthetic has for UI work (probably because it was trained extensively with vision and to translate between visual input and code).

Cons: Prone to outright laziness (keeping code for “backward compatibility”, marking things as “to implement later”) and thinking a bit too laterally. Should have an eye kept on her for longer tasks. Not quite as good as GLM-5 for backend work.

Although Kimi was trained with Agent Swarms, to get the same results as MoonshotAI you would have to use their proprietary swarms endpoint which is not available on Synthetic. However, similar results may be had using massively parallel sub-agents, or utilizing an SDK such as Swarms with various roles.

Load-Balanced Routing (NVFP4 + INT4)

As of April 2026, Synthetic routes Kimi K2.5 requests between two hardware backends based on current load. Both model strings (hf:moonshotai/Kimi-K2.5 and hf:nvidia/Kimi-K2.5-NVFP4) may silently hit either backend. You don’t need to do anything differently.

INT4 variant — runs on H200 GPUs (original Moonshot quant)
NVFP4 variant — runs on B200 GPUs (Nvidia’s new near-lossless quant format, now U.S.-based)

Why: B200 capacity was sometimes overloaded while H200s sat with excess capacity. Routing between them based on load smooths this out and lets Synthetic scale Kimi via B200s going forward. Some INT4 capacity remains on reserved H200s (NVFP4 provides no perf advantage on H200 hardware).

Benchmark comparison (from Synthetic’s internal testing):

Benchmark	INT4	NVFP4	Delta
AIME	91.0	93.3	NVFP4 +3.3
Aider Polyglot	74.4	71.1	INT4 +3.3
LiveCodeBench (subset)	—	—	NVFP4 +4.0

All within margin of error. NVFP4 mildly better on 2 of 3 benchmarks. Synthetic considers them equivalent for routing purposes.