Differences

This shows you the differences between two versions of the page.

--- limits [2026/04/19 04:30] – What changed: Fixed "fix set of per-hour" → "fixed set of 5-hour" (typo) Added Founder's Packs section — 50% more ($36/wk, 750/5hrs) at same $30/mo price Added History: Rate Limit Changes section — old system (135/5hrs + 500 tool calls/day = 1,148/day) vs gwyntel
+++ limits [2026/04/19 22:16] (current) – Added rate-limit evolution and v3 experiment details gwyntel
@@ Line 60: / Line 60: @@
 </WRAP>
-The 80% cache-read discount on the weekly token quota is currently **subscription-only**. Synthetic plans to roll it out to PAYG API in the future.
+The 80% cache-read discount on the weekly token quota is **subscription-only for now**. Synthetic has stated they plan to roll cache discounting out to PAYG in the future, but currently PAYG users pay full price for all tokens, including cache hits.
 See [[:prompt_caching]] for details on how the cache works and why hit rates vary.
@@ Line 101: / Line 101: @@
 Synthetic also provide //unlimited// access to [[models:nomic-embed-text-15|Nomic Embed Text 1.5]], an embedding model, and two LORAs they trained for [[harnesses/octofriend]], ''[[https://huggingface.co/syntheticlab/diff-apply|hf:syntheticlab/diff-apply]]'' and ''[[https://huggingface.co/syntheticlab/fix-json|hf:syntheticlab/fix-json]]''. Like any model from Synthetic, they're usable in any harness. ''diff-apply'' applies find-replace style diffs to code and ''fix-json'' fixes broken JSON tool calls. They require particular input/output formats, so be sure to check the model cards.
+===== Rate-Limit Evolution =====
+<WRAP center round info 60%>
+Based on public statements from Synthetic staff. The rate limiting system has gone through several iterations due to abuse vectors.
+</WRAP>
+^ Phase ^ System ^ Problem ^
+| **v1** | X requests per 5 hours + free tool calls | Users formatted any request as a tool call to get free requests. 3 users consumed >1/3 of total capacity. |
+| **v2** | Tool calls count as percentage of requests (e.g. 10%) | Percentage-based discount could still be abused for ~10x the quota. |
+| **v3** (current) | Weekly token quota ($24/week per pack) + 500 requests per 5 hours | Token-based weekly limit eliminates tool call abuse. Requests weighted by output token cost. |
+The **rate-limit-v3 experiment** launched on April 7, 2026 after three weeks of opt-in testing. Key changes:
+  - **5-hour requests**: 500 per pack (up from 135), weighted by output token cost
+  - **Weekly tokens**: $24.00 worth of compute per pack (replaces daily tool call limits)
+  - **Tool calls**: No longer separately counted — all usage flows through the weekly token quota
+  - **Founder's packs**: 50% more ($36/week tokens, 750 requests/5 hours)
+  - **Concurrency**: 1 at a time per pack (2 for "small models": Nemotron 3 Super & GLM-4.7-Flash)
+<WRAP center round tip>
+The weekly token quota means you don't need to think about "saving" tool calls vs regular requests anymore. Everything is just tokens. Unused 5-hour requests may eventually get "refunded" into the weekly token quota (proposed but not yet confirmed).
+</WRAP>
+=== Why Request-Based, Not Token-Based? ===
+Synthetic chose request-based limits over pure token-based limits for simplicity:
+  - Token-based pricing encourages gaming (deleting conversation history to save quota, splitting context)
+  - Request count follows a predictable pattern relative to cost
+  - With the weekly token quota, the worst of both approaches is mitigated