models:glm-5.1 [Organic]

This is an old revision of the document!

GLM-5.1

hf:zai-org/GLM-5.1

Price: $1.00/mtok in, $3.00/mtok out

GLM-5.1 is currently in beta. It may have higher rate limit impact than stable models. Price may increase to better reflect compute costs (~$1.40/mtok in, ~$4.40/mtok out has been discussed).

GLM-5.1 is currently the smartest/most capable coding and agentic model hosted directly by Synthetic. Also the most capable open weight model period, trading blows with SOTA proprietary models like Opus 4.6 and GPT-5.4.

Consequently, it is also currently the most expensive model; while its output cost is actually lower than Kimi K2.5’s, GLM-5.1’s input costs are 2x Kimi K2.5’s, and input tokens usually dominate output tokens in coding and agentic use cases, meaning that input prices dominate when comparing model cost.

Pros: Excels at almost all coding and long-horizon agentic work. Widely considered to be exceptionally good at code review. Even better than GLM-5 which was already excellent.
Cons: Quite a bit worse at user interface work than Kimi K2.5. Overkill for basic assistant work, such as for OpenClaw. Worse at lateral thinking than other frontier models (needs more express guidance). More compute-intensive than Kimi K2.5 — closer to redline during peak hours.

Compute Requirements

GLM-5.1 requires 4 B200 GPUs per replica at NVFP4 (same as GLM-5). It is more compute-intensive than Kimi K2.5 at current pricing, and Synthetic has acknowledged the price may need to increase to better reflect actual compute costs and improve average performance during peak times.

The model uses SGLang for inference (which provides better cache hit behavior than vLLM for small requests) and runs on B200 hardware with NVFP4 quantization. There is no official NVFP4 quant from Nvidia yet, but llmcompressor supports the GLM-5 architecture, so Synthetic can produce their own quant.

Known Issues

Prefill stalling: More active parameters means more expensive prefill. Under load, prefills can block decode, causing perceived stalling. This affects GLM-5/5.1 more than GLM-4.7.
Capacity constraints: GLM-5.1 has been running close to redline during peak hours. Synthetic is working on bringing up more compute.

See also: GLM-5 (predecessor, being retired), Kimi K2.5 (complementary frontier model with vision)