Differences

This shows you the differences between two versions of the page.

--- models:kimi-k25 [2026/04/19 04:25] – gwyntel
+++ models:kimi-k25 [2026/04/19 22:16] (current) – Added NVFP4/INT4 precision mix detail gwyntel
@@ Line 37: / Line 37: @@
 All within margin of error. NVFP4 mildly better on 2 of 3 benchmarks. Synthetic considers them equivalent for routing purposes.
+=== Precision Mix Detail ===
+<WRAP center round info 60%>
+Since April 7, 2026, Synthetic serves Kimi K2.5 from a **mix of NVFP4 (B200s) and original INT4 (H200s)**. The ''/models'' API endpoint reports NVFP4 as the quant format, but your request may silently hit either backend based on current load.
+</WRAP>
+When serving a model in multiple precisions, Synthetic reports the "least legitimate" (i.e., lab-released original) format in the ''/models'' endpoint. Since Moonshot's original release was INT4 and Nvidia's NVFP4 is a derived quant, NVFP4 is reported. However, INT4 capacity remains on reserved H200s where NVFP4 provides no performance advantage.
+This means: even if you explicitly use ''hf:nvidia/Kimi-K2.5-NVFP4'', your request may still be served by the INT4 variant on H200s during times of B200 load.