models:glm-5.1 [Organic]

This is an old revision of the document!

GLM-5.1

hf:zai-org/GLM-5.1

Price: $1.00/mtok in, $3.00/mtok out

GLM-5.1 is currently in beta. It may have higher rate limit impact than stable models. Price may increase to better reflect compute costs (~$1.40/mtok in, ~$4.40/mtok out has been discussed).

GLM-5.1 is currently the smartest/most capable coding and agentic model hosted directly by Synthetic. Also the most capable open weight model period, trading blows with SOTA proprietary models like Opus 4.6 and GPT-5.4.

Consequently, it is also currently the most expensive model; while its output cost is actually lower than Kimi K2.5’s, GLM-5.1’s input costs are 2x Kimi K2.5’s, and input tokens usually dominate output tokens in coding and agentic use cases, meaning that input prices dominate when comparing model cost.

Pros: Excels at almost all coding and long-horizon agentic work. Widely considered to be exceptionally good at code review. Even better than GLM-5 which was already excellent.
Cons: Quite a bit worse at user interface work than Kimi K2.5. Overkill for basic assistant work, such as for OpenClaw. Worse at lateral thinking than other frontier models (needs more express guidance). More compute-intensive than Kimi K2.5 — closer to redline during peak hours.

Compute Requirements

GLM-5.1 runs on 4 B200 GPUs per replica at NVFP4 quantization (same as GLM-5), and uses SGLang instead of vLLM for better cache hit rates. This is compared to 8 B200s for Kimi K2.5, making it theoretically both faster (due to less NVLink overhead) and cheaper (due to less energy required, and needing to rent fewer expensive GPUs). There is no official NVFP4 quant from Nvidia yet, but llmcompressor supports the GLM-5 architecture, which has allowed Synthetic to produce their own quant.

Despite the above numbers, Synthetic claims that it is actually *more* compute-intensive than Kimi K2.5, and they say that the price may need to increase to better reflect this, and has floated the idea that this might resolve GLM-5.1’s noted instability (see below). The price-point they have pointed to as the goal (the “market rates”) are per-token API prices on OpenRouter, where GLM-5.1 is slightly more expensive than GLM-5.

However, users have noted that the subscription-based pricing model used by Synthetic differs significantly from the per-token based pricing model seen on OpenRouter: on OpenRouter, providers are trying to make a per-token profit, so the raise in prices could be just due to greater demand for GLM-5.1 due to its increased capabilities, whereas Synthetic’s subscription model means that token pricing is only meant to be a bellweather for compute costs.

These price hikes have been floated as a solution to the recent instability of the GLM-5.1 replicas, covered below.

Known Issues

Prefill stalling: More active parameters means more expensive prefill. Under load, prefills can block decode, causing perceived stalling, and eventually request timeouts in some cases. This affects GLM-5/5.1 more than GLM-4.7.
Capacity constraints: GLM-5.1 has been running close to redline during peak hours. Synthetic is working on bringing up more compute.
Node instability: There have been several instances of entire GPU nodes crashing in the last two or three weeks. The cause is not clear, and often Synthetic’s monitoring system does not catch it until there are several people reporting it in the Discord.

It is unclear if an increase in pricing would resolve any of these issues.

See also: GLM-5 (predecessor, being retired), Kimi K2.5 (complementary frontier model with vision)