Differences

This shows you the differences between two versions of the page.

--- limits [2026/04/10 01:06] – xenolandscapes
+++ limits [2026/04/19 22:16] (current) – Added rate-limit evolution and v3 experiment details gwyntel
@@ Line 1: / Line 1: @@
-===== Synthetic Subscription Limits and Pricing =====
+===== Synthetic Limits and Pricing =====
 ==== Subscription Pricing ====
@@ Line 7: / Line 7: @@
 **A base subscription is $30/mo**, and corresponds to one "pack."
-This subscription provides a fix set of **per-hour requests** and **weekly input and output token limits**.
+This subscription provides a fixed set of **5-hour requests**, **weekly token limits**, and **concurrency**.
-**If you add a ''pack'', you're essentially adding a second $30/mo subscription**: you get double the total usage for both hourly and weekly limits, double the recharge rate (since recharge is a percentage) and double the price. Add another pack, and it's triple the original price, limits, and recharge rate.
+**If you add a ''pack'', you're essentially adding a second $30/mo subscription**: you get double the total usage for both hourly and weekly limits, double the recharge rate (since recharge is a percentage), double the concurrency, and double the price. Add another pack, and it's triple the original price, limits, recharge rate, and concurrency.
 **You can do this for up to 5 packs in total.**
+=== Founder's Packs ===
+<WRAP center round info>
+**Founder's packs** are worth **50% more** than standard packs — $36/week in tokens and 750 requests per 5 hours (vs $24/week and 500 for standard). Same $30/mo price.
+</WRAP>
 ==== Subscription Limits ====
@@ Line 20: / Line 26: @@
 | **5-Hour Requests** | 500 requests (weighted by output token cost) | 5% every 3 minutes | Prevents hardware/GPU overload |
 | **Weekly Tokens** | $24.00 worth of compute (uses [[https://synthetic.new/pricing?initial=usage|API prices]] + 80% cache-read discount) | 2% every 3 hours | Ensures long-term sustainability |
+| **Concurrency** | 1 at a time (2 for "small models: Nemotron 3 Super & GLM 4.7 Flash") | N/A | More simultaneous requests |
 | **Price** | $30.00 / month | N/A | Subsidizes "power users" |
-These limits deplete and recharge like a stamina or mana bar in a video game, instead of emptying out completely and then only refreshing at the end of the time limit, to avoid completely locking you out. The percentage each recharges by is calibrated to let a steady, not-too-small trickle of work continue to get done just based on the recharge increment if you're truly desperate.
+Founder's packs: **$36/week tokens** and **750 requests/5 hours** instead of the standard values above.
+These limits deplete and recharge like a stamina or mana bar in a video game, instead of emptying out completely and only refreshing at the end of the time limit, to avoid completely locking you out. The percentage each recharges by is calibrated to let a steady, not-too-small trickle of work continue to get done just based on the recharge increment if you're truly desperate.
 <WRAP tip center round>
@@ Line 30: / Line 39: @@
 The purpose of each of these bars is different:
-  - The hourly requests bar is designed to keep burst usage from getting too high, so that Synthetic's GPUs don't get overloaded and slow down Tokens Per Second and Time To First Token for everyone.
+  - The 5-hour requests bar is designed to keep burst usage from getting too high, so that Synthetic's GPUs don't get overloaded and slow down Tokens Per Second and Time To First Token for everyone.
   - The weekly limit is designed to ensure that nobody is able to use so much compute — which costs money, via electricity prices — that their subscription becomes //too// unprofitable for Synthetic.
+==== History: Rate Limit Changes ====
+<WRAP center round>
+Synthetic overhauled rate limits in mid-April 2026 after a 3-week opt-in experiment. Key changes:
+</WRAP>
+  - **Replaced daily tool call quota with weekly token-based quota** — the old system counted tool calls separately (500/day), which meant you could exhaust your tool calls while still having regular request budget unused. Token-based counting is more flexible and fairer: less load = less limiting, more load = more limiting.
+  - **5-hour request limit increased from 135 to 500 per pack** — previously max ~1,148 requests/day per pack (135 × 24/5 + 500 tool calls). Now 2,400 requests/day per pack. Founder's packs: 750 per 5 hours = 3,600/day.
+  - **Weekly token quota introduced at $24/week per pack** — guaranteed to always be better value than PAYG API pricing. Previously, some usage patterns were actually cheaper on PAYG than subscription.
+  - **Continuous regeneration** — hitting your weekly quota doesn't lock you out for a week. Wait a day and one day's worth regenerates automatically.
+==== Cache Hit Reporting ====
+<WRAP info round>
+Cache hits and misses are now reported in API responses using the standard OpenAI and Anthropic response formats. This lets you see exactly how many of your input tokens were served from cache (and thus discounted 80%).
+</WRAP>
+The 80% cache-read discount on the weekly token quota is **subscription-only for now**. Synthetic has stated they plan to roll cache discounting out to PAYG in the future, but currently PAYG users pay full price for all tokens, including cache hits.
+See [[:prompt_caching]] for details on how the cache works and why hit rates vary.
+==== Request cost scaling ====
+Synthetic scales the "cost" of a request according to how expensive the requested model is based on [[https://synthetic.new/pricing?initial=usage|the PAYG pricing]]. A model costs exactly 1.0 requests if it has a token output price of $3.40/million. MiniMax M2.5, for example, costs 0.59 requests ($2.00/$3.40) because it's cheap to run. This makes Kimi K2.5 cost 1.0 requests, and most other models cost less.
 ==== Profitability ====
@@ Line 60: / Line 93: @@
   * [[https://ngrok.com/blog/quantization|Quantization from the ground up]]
-  * [[https://www.theregister.com/2026/03/07/ai_inference_economics/|Unpacking the deceptively simple science of tokenomics]]
+  * [[https://www.theregister.com/2026/03/07/ai_inference-costs-unpacked/|Unpacking the deceptively simple science of tokenomics]]
   * [[https://venturebeat.com/infrastructure/ai-inference-costs-dropped-up-to-10x-on-nvidias-blackwell-but-hardware-is|AI inference costs dropped up to 10x on Nvidia's Blackwell — but hardware is only half the equation]]
+===== Additional Limits =====
+Synthetic also provide //unlimited// access to [[models:nomic-embed-text-15|Nomic Embed Text 1.5]], an embedding model, and two LORAs they trained for [[harnesses/octofriend]], ''[[https://huggingface.co/syntheticlab/diff-apply|hf:syntheticlab/diff-apply]]'' and ''[[https://huggingface.co/syntheticlab/fix-json|hf:syntheticlab/fix-json]]''. Like any model from Synthetic, they're usable in any harness. ''diff-apply'' applies find-replace style diffs to code and ''fix-json'' fixes broken JSON tool calls. They require particular input/output formats, so be sure to check the model cards.
+===== Rate-Limit Evolution =====
+<WRAP center round info 60%>
+Based on public statements from Synthetic staff. The rate limiting system has gone through several iterations due to abuse vectors.
+</WRAP>
+^ Phase ^ System ^ Problem ^
+| **v1** | X requests per 5 hours + free tool calls | Users formatted any request as a tool call to get free requests. 3 users consumed >1/3 of total capacity. |
+| **v2** | Tool calls count as percentage of requests (e.g. 10%) | Percentage-based discount could still be abused for ~10x the quota. |
+| **v3** (current) | Weekly token quota ($24/week per pack) + 500 requests per 5 hours | Token-based weekly limit eliminates tool call abuse. Requests weighted by output token cost. |
+The **rate-limit-v3 experiment** launched on April 7, 2026 after three weeks of opt-in testing. Key changes:
+  - **5-hour requests**: 500 per pack (up from 135), weighted by output token cost
+  - **Weekly tokens**: $24.00 worth of compute per pack (replaces daily tool call limits)
+  - **Tool calls**: No longer separately counted — all usage flows through the weekly token quota
+  - **Founder's packs**: 50% more ($36/week tokens, 750 requests/5 hours)
+  - **Concurrency**: 1 at a time per pack (2 for "small models": Nemotron 3 Super & GLM-4.7-Flash)
+<WRAP center round tip>
+The weekly token quota means you don't need to think about "saving" tool calls vs regular requests anymore. Everything is just tokens. Unused 5-hour requests may eventually get "refunded" into the weekly token quota (proposed but not yet confirmed).
+</WRAP>
+=== Why Request-Based, Not Token-Based? ===
+Synthetic chose request-based limits over pure token-based limits for simplicity:
+  - Token-based pricing encourages gaming (deleting conversation history to save quota, splitting context)
+  - Request count follows a predictable pattern relative to cost
+  - With the weekly token quota, the worst of both approaches is mitigated