Organic

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
models:kimi-k25 [2026/04/19 04:25] gwyntelmodels:kimi-k25 [2026/04/19 22:16] (current) – Added NVFP4/INT4 precision mix detail gwyntel
Line 37: Line 37:
 All within margin of error. NVFP4 mildly better on 2 of 3 benchmarks. Synthetic considers them equivalent for routing purposes. All within margin of error. NVFP4 mildly better on 2 of 3 benchmarks. Synthetic considers them equivalent for routing purposes.
  
 +
 +=== Precision Mix Detail ===
 +
 +<WRAP center round info 60%>
 +Since April 7, 2026, Synthetic serves Kimi K2.5 from a **mix of NVFP4 (B200s) and original INT4 (H200s)**. The ''/models'' API endpoint reports NVFP4 as the quant format, but your request may silently hit either backend based on current load.
 +</WRAP>
 +
 +When serving a model in multiple precisions, Synthetic reports the "least legitimate" (i.e., lab-released original) format in the ''/models'' endpoint. Since Moonshot's original release was INT4 and Nvidia's NVFP4 is a derived quant, NVFP4 is reported. However, INT4 capacity remains on reserved H200s where NVFP4 provides no performance advantage.
 +
 +This means: even if you explicitly use ''hf:nvidia/Kimi-K2.5-NVFP4'', your request may still be served by the INT4 variant on H200s during times of B200 load.