Organic

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
limits [2026/04/19 04:30] – What changed: Fixed "fix set of per-hour" → "fixed set of 5-hour" (typo) Added Founder's Packs section — 50% more ($36/wk, 750/5hrs) at same $30/mo price Added History: Rate Limit Changes section — old system (135/5hrs + 500 tool calls/day = 1,148/day) vs gwyntellimits [2026/04/19 22:16] (current) – Added rate-limit evolution and v3 experiment details gwyntel
Line 60: Line 60:
 </WRAP> </WRAP>
  
-The 80% cache-read discount on the weekly token quota is currently **subscription-only**. Synthetic plans to roll it out to PAYG API in the future.+The 80% cache-read discount on the weekly token quota is **subscription-only for now**. Synthetic has stated they plan to roll cache discounting out to PAYG in the future, but currently PAYG users pay full price for all tokens, including cache hits.
  
 See [[:prompt_caching]] for details on how the cache works and why hit rates vary. See [[:prompt_caching]] for details on how the cache works and why hit rates vary.
Line 101: Line 101:
 Synthetic also provide //unlimited// access to [[models:nomic-embed-text-15|Nomic Embed Text 1.5]], an embedding model, and two LORAs they trained for [[harnesses/octofriend]], ''[[https://huggingface.co/syntheticlab/diff-apply|hf:syntheticlab/diff-apply]]'' and ''[[https://huggingface.co/syntheticlab/fix-json|hf:syntheticlab/fix-json]]''. Like any model from Synthetic, they're usable in any harness. ''diff-apply'' applies find-replace style diffs to code and ''fix-json'' fixes broken JSON tool calls. They require particular input/output formats, so be sure to check the model cards. Synthetic also provide //unlimited// access to [[models:nomic-embed-text-15|Nomic Embed Text 1.5]], an embedding model, and two LORAs they trained for [[harnesses/octofriend]], ''[[https://huggingface.co/syntheticlab/diff-apply|hf:syntheticlab/diff-apply]]'' and ''[[https://huggingface.co/syntheticlab/fix-json|hf:syntheticlab/fix-json]]''. Like any model from Synthetic, they're usable in any harness. ''diff-apply'' applies find-replace style diffs to code and ''fix-json'' fixes broken JSON tool calls. They require particular input/output formats, so be sure to check the model cards.
  
 +
 +===== Rate-Limit Evolution =====
 +
 +<WRAP center round info 60%>
 +Based on public statements from Synthetic staff. The rate limiting system has gone through several iterations due to abuse vectors.
 +</WRAP>
 +
 +^ Phase ^ System ^ Problem ^
 +| **v1** | X requests per 5 hours + free tool calls | Users formatted any request as a tool call to get free requests. 3 users consumed >1/3 of total capacity. |
 +| **v2** | Tool calls count as percentage of requests (e.g. 10%) | Percentage-based discount could still be abused for ~10x the quota. |
 +| **v3** (current) | Weekly token quota ($24/week per pack) + 500 requests per 5 hours | Token-based weekly limit eliminates tool call abuse. Requests weighted by output token cost. |
 +
 +The **rate-limit-v3 experiment** launched on April 7, 2026 after three weeks of opt-in testing. Key changes:
 +
 +  - **5-hour requests**: 500 per pack (up from 135), weighted by output token cost
 +  - **Weekly tokens**: $24.00 worth of compute per pack (replaces daily tool call limits)
 +  - **Tool calls**: No longer separately counted — all usage flows through the weekly token quota
 +  - **Founder's packs**: 50% more ($36/week tokens, 750 requests/5 hours)
 +  - **Concurrency**: 1 at a time per pack (2 for "small models": Nemotron 3 Super & GLM-4.7-Flash)
 +
 +<WRAP center round tip>
 +The weekly token quota means you don't need to think about "saving" tool calls vs regular requests anymore. Everything is just tokens. Unused 5-hour requests may eventually get "refunded" into the weekly token quota (proposed but not yet confirmed).
 +</WRAP>
 +
 +=== Why Request-Based, Not Token-Based? ===
 +
 +Synthetic chose request-based limits over pure token-based limits for simplicity:
 +
 +  - Token-based pricing encourages gaming (deleting conversation history to save quota, splitting context)
 +  - Request count follows a predictable pattern relative to cost
 +  - With the weekly token quota, the worst of both approaches is mitigated