Differences

This shows you the differences between two versions of the page.

--- limits [2026/04/19 05:05] – update limits with payg caveat gwyntel
+++ limits [2026/04/19 22:16] (current) – Added rate-limit evolution and v3 experiment details gwyntel
@@ Line 101: / Line 101: @@
 Synthetic also provide //unlimited// access to [[models:nomic-embed-text-15|Nomic Embed Text 1.5]], an embedding model, and two LORAs they trained for [[harnesses/octofriend]], ''[[https://huggingface.co/syntheticlab/diff-apply|hf:syntheticlab/diff-apply]]'' and ''[[https://huggingface.co/syntheticlab/fix-json|hf:syntheticlab/fix-json]]''. Like any model from Synthetic, they're usable in any harness. ''diff-apply'' applies find-replace style diffs to code and ''fix-json'' fixes broken JSON tool calls. They require particular input/output formats, so be sure to check the model cards.
+===== Rate-Limit Evolution =====
+<WRAP center round info 60%>
+Based on public statements from Synthetic staff. The rate limiting system has gone through several iterations due to abuse vectors.
+</WRAP>
+^ Phase ^ System ^ Problem ^
+| **v1** | X requests per 5 hours + free tool calls | Users formatted any request as a tool call to get free requests. 3 users consumed >1/3 of total capacity. |
+| **v2** | Tool calls count as percentage of requests (e.g. 10%) | Percentage-based discount could still be abused for ~10x the quota. |
+| **v3** (current) | Weekly token quota ($24/week per pack) + 500 requests per 5 hours | Token-based weekly limit eliminates tool call abuse. Requests weighted by output token cost. |
+The **rate-limit-v3 experiment** launched on April 7, 2026 after three weeks of opt-in testing. Key changes:
+  - **5-hour requests**: 500 per pack (up from 135), weighted by output token cost
+  - **Weekly tokens**: $24.00 worth of compute per pack (replaces daily tool call limits)
+  - **Tool calls**: No longer separately counted — all usage flows through the weekly token quota
+  - **Founder's packs**: 50% more ($36/week tokens, 750 requests/5 hours)
+  - **Concurrency**: 1 at a time per pack (2 for "small models": Nemotron 3 Super & GLM-4.7-Flash)
+<WRAP center round tip>
+The weekly token quota means you don't need to think about "saving" tool calls vs regular requests anymore. Everything is just tokens. Unused 5-hour requests may eventually get "refunded" into the weekly token quota (proposed but not yet confirmed).
+</WRAP>
+=== Why Request-Based, Not Token-Based? ===
+Synthetic chose request-based limits over pure token-based limits for simplicity:
+  - Token-based pricing encourages gaming (deleting conversation history to save quota, splitting context)
+  - Request count follows a predictable pattern relative to cost
+  - With the weekly token quota, the worst of both approaches is mitigated