Organic

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
models:glm-5.1 [2026/04/21 13:02] xenolandscapesmodels:glm-5.1 [2026/04/21 13:07] (current) xenolandscapes
Line 20: Line 20:
 GLM-5.1 runs on 4 B200 GPUs per replica at NVFP4 quantization (same as GLM-5), and uses SGLang instead of vLLM for better cache hit rates. This is compared to 8 B200s for Kimi K2.5, making it theoretically both faster (due to less NVLink overhead) and cheaper (due to less energy required, and needing to rent fewer expensive GPUs). There is no official NVFP4 quant from Nvidia yet, but llmcompressor supports the GLM-5 architecture, which has allowed Synthetic to produce their own quant. GLM-5.1 runs on 4 B200 GPUs per replica at NVFP4 quantization (same as GLM-5), and uses SGLang instead of vLLM for better cache hit rates. This is compared to 8 B200s for Kimi K2.5, making it theoretically both faster (due to less NVLink overhead) and cheaper (due to less energy required, and needing to rent fewer expensive GPUs). There is no official NVFP4 quant from Nvidia yet, but llmcompressor supports the GLM-5 architecture, which has allowed Synthetic to produce their own quant.
  
-Despite the above numbers, Synthetic claims that it is actually *more* compute-intensive than Kimi K2.5, and they say that the price may need to increase to better reflect this. The price-point they have pointed to as the goal (the "market rates") are per-token API prices on OpenRouter, where GLM-5.1 is slightly more expensive than GLM-5. However, these are prices from providers that are trying to make a per-token profit, so that raise in prices could be just due to greater demand for GLM-5.1 due to its increased capabilities, whereas Synthetic'pricing model is on a subscription basis, where token pricing is only meant to be a bellweather for compute costs, so it is unclear how much sense this proposed price hike makes.+Despite the above numbers, Synthetic claims that it is actually *more* compute-intensive than Kimi K2.5, and they say that the price may need to increase to better reflect this, and has floated the idea that this might resolve GLM-5.1's noted instability (see below). The price-point they have pointed to as the goal (the "market rates") are per-token API prices on OpenRouter, where GLM-5.1 is slightly more expensive than GLM-5.  
 + 
 +However, users have noted that the subscription-based pricing model used by Synthetic differs significantly from the per-token based pricing model seen on OpenRouter: on OpenRouter, providers are trying to make a per-token profit, so the raise in prices could be just due to greater demand for GLM-5.1 due to its increased capabilities, whereas Synthetic'subscription model means that token pricing is only meant to be a bellweather for compute costs.
  
 These price hikes have been floated as a solution to the recent instability of the GLM-5.1 replicas, covered below. These price hikes have been floated as a solution to the recent instability of the GLM-5.1 replicas, covered below.
Line 28: Line 30:
   - **Prefill stalling:** More active parameters means more expensive prefill. Under load, prefills can block decode, causing perceived stalling, and eventually request timeouts in some cases. This affects GLM-5/5.1 more than GLM-4.7.   - **Prefill stalling:** More active parameters means more expensive prefill. Under load, prefills can block decode, causing perceived stalling, and eventually request timeouts in some cases. This affects GLM-5/5.1 more than GLM-4.7.
   - **Capacity constraints:** GLM-5.1 has been running close to redline during peak hours. Synthetic is working on bringing up more compute.   - **Capacity constraints:** GLM-5.1 has been running close to redline during peak hours. Synthetic is working on bringing up more compute.
-  - **Node instability:** There have been several instances of entire GPU nodes crashing in the last two or three weeks. The cause is not clearand often Synthetic's monitoring system does not catch it until there are several people reporting it in the Discord.+  - **Node instability:** There have been several instances of entire GPU nodes crashing in the last two or three weeks, which has led to service gaps where official status updates often lag behind community reports in the Discord.
  
 It is unclear if an increase in pricing would resolve any of these issues. It is unclear if an increase in pricing would resolve any of these issues.
  
 See also: [[:models:glm-5|GLM-5]] (predecessor, being retired), [[:models:kimi-k25|Kimi K2.5]] (complementary frontier model with vision) See also: [[:models:glm-5|GLM-5]] (predecessor, being retired), [[:models:kimi-k25|Kimi K2.5]] (complementary frontier model with vision)