models:glm-5

GLM-5

hf:zai-org/GLM-5

Price: $1.00/mtok in, $3.00/mtok out (beta pricing)

GLM-5 is in beta and is slated to be retired/proxied once GLM-5.1 exits beta. For new projects, prefer GLM-5.1.

GLM-5 was launched in beta on March 30, 2026, after SGLang made progress stabilizing GLM-5’s new architecture. It trades blows with proprietary models like Opus 4.6 and GPT-5.4 for coding and agentic tasks.

Pros: Excellent at backend coding and long-horizon agentic work. Fewer total parameters than Kimi K2.5, so it can run at tp4 on B200 NVFP4 (4 GPUs vs K2.5’s required tp8), which is usually faster due to reduced NVLink overhead.
Cons: Not as strong at UI/frontend work. Beta pricing is set high (impacts rate limits). Will be replaced by GLM-5.1.

Architecture Notes

GLM-5 has fewer total parameters than Kimi K2.5, making it more efficient to serve on B200 hardware at NVFP4 quantization. However, vLLM and SGLang support for GLM-5’s architecture was initially poor — Synthetic had to wait for upstream fixes before stable hosting was possible.

The model runs on SGLang (which is faster for the GLM series) and uses NVFP4 quantization on B200 GPUs. Each replica requires 4 B200 GPUs (tp4).

Deprecation Path

Based on public statements from Synthetic staff, the plan is:

1. Take GLM-5 out of beta, stop self-hosting GLM-4.7
2. Put [[:models:glm-5.1|GLM-5.1]] in beta
3. Once GLM-5.1 is out of beta, retire/proxy GLM-5

Old models are typically proxied to Fireworks or TogetherAI, although proxy duration depends on load since proxies are expensive.

See also: GLM-5.1 (the replacement), GLM-4.7 (the predecessor)