Differences

This shows you the differences between two versions of the page.

--- models:glm-5 [2026/04/11 14:36] – created kat
+++ models:glm-5 [2026/04/19 22:16] (current) – Updated GLM-5 page: split from GLM-5.1, added deprecation notice gwyntel
@@ Line 3: / Line 3: @@
 ''hf:zai-org/GLM-5''
-**Price**: $1.00/mtok in, $3.00/mtok out
+**Price**: $1.00/mtok in, $3.00/mtok out (beta pricing)
-Currently the <wrap hi>smartest/most capable coding and agentic model</wrap> hosted directly by Synthetic.
+<WRAP center round alert 60%>
+GLM-5 is in **beta** and is slated to be retired/proxied once [[:models:glm-5.1|GLM-5.1]] exits beta. For new projects, prefer GLM-5.1.
+</WRAP>
-Also, barring GLM 5.1, the most capable open weight model period, trading blows with SOTA proprietary models like Opus 4.6 and GPT-5.4.
+GLM-5 was launched in beta on March 30, 2026, after SGLang made progress stabilizing GLM-5's new architecture. It trades blows with proprietary models like Opus 4.6 and GPT-5.4 for coding and agentic tasks.
-Consequently, it is also currently the most expensive model; while its output cost is actually lower than Kimi K2.5's, <wrap hi>GLM 5's input costs are 2x Kimi K2.5's</wrap>, and input tokens usually dominate output tokens in coding and agentic use cases, meaning that input prices dominate when comparing model cost.
+  * **Pros:** Excellent at backend coding and long-horizon agentic work. Fewer total parameters than Kimi K2.5, so it can run at tp4 on B200 NVFP4 (4 GPUs vs K2.5's required tp8), which is usually faster due to reduced NVLink overhead.
+  * **Cons:** Not as strong at UI/frontend work. Beta pricing is set high (impacts rate limits). Will be replaced by GLM-5.1.
-  * **Pros:** Excels at almost all coding (see below) and long-horizon agentic work. Widely considered to be exceptionally good at code review.
+=== Architecture Notes ===
-  * **Cons:** Quite a bit worse at user interface work. Overkill for basic assistant work, such as for OpenClaw. Worse at lateral thinking than other frontier models (needs more express guidance).
+GLM-5 has fewer total parameters than Kimi K2.5, making it more efficient to serve on B200 hardware at NVFP4 quantization. However, vLLM and SGLang support for GLM-5's architecture was initially poor — Synthetic had to wait for upstream fixes before stable hosting was possible.
+The model runs on SGLang (which is faster for the GLM series) and uses NVFP4 quantization on B200 GPUs. Each replica requires 4 B200 GPUs (tp4).
+=== Deprecation Path ===
+Based on public statements from Synthetic staff, the plan is:
+. Take GLM-5 out of beta, stop self-hosting GLM-4.7
+. Put [[:models:glm-5.1|GLM-5.1]] in beta
+. Once GLM-5.1 is out of beta, retire/proxy GLM-5
+Old models are typically proxied to Fireworks or TogetherAI, although proxy duration depends on load since proxies are expensive.
+See also: [[:models:glm-5.1|GLM-5.1]] (the replacement), [[:models:glm-4.7|GLM-4.7]] (the predecessor)