hf:zai-org/GLM-5
Price: $1.00/mtok in, $3.00/mtok out (beta pricing)
GLM-5 is in beta and is slated to be retired/proxied once GLM-5.1 exits beta. For new projects, prefer GLM-5.1.
GLM-5 was launched in beta on March 30, 2026, after SGLang made progress stabilizing GLM-5’s new architecture. It trades blows with proprietary models like Opus 4.6 and GPT-5.4 for coding and agentic tasks.
GLM-5 has fewer total parameters than Kimi K2.5, making it more efficient to serve on B200 hardware at NVFP4 quantization. However, vLLM and SGLang support for GLM-5’s architecture was initially poor — Synthetic had to wait for upstream fixes before stable hosting was possible.
The model runs on SGLang (which is faster for the GLM series) and uses NVFP4 quantization on B200 GPUs. Each replica requires 4 B200 GPUs (tp4).
Based on public statements from Synthetic staff, the plan is:
1. Take GLM-5 out of beta, stop self-hosting GLM-4.7 2. Put [[:models:glm-5.1|GLM-5.1]] in beta 3. Once GLM-5.1 is out of beta, retire/proxy GLM-5
Old models are typically proxied to Fireworks or TogetherAI, although proxy duration depends on load since proxies are expensive.
See also: GLM-5.1 (the replacement), GLM-4.7 (the predecessor)