The Synthetic subscription — not to be confused with usage-based pricing — works in terms of “packs”.
A base subscription is $30/mo, and corresponds to one “pack.”
This subscription provides a fixed set of 5-hour requests, weekly token limits, and concurrency.
If you add a pack, you’re essentially adding a second $30/mo subscription: you get double the total usage for both hourly and weekly limits, double the recharge rate (since recharge is a percentage), double the concurrency, and double the price. Add another pack, and it’s triple the original price, limits, recharge rate, and concurrency.
You can do this for up to 5 packs in total.
Founder’s packs are worth 50% more than standard packs — $36/week in tokens and 750 requests per 5 hours (vs $24/week and 500 for standard). Same $30/mo price.
Your subscription has two limits, determined by how many packs you have. Both limits recharge by a fixed percentage of their total (determined by how many packs you have) over a fixed time interval, calculated to ensure that each bar would refill from empty within the time-frame represented. See the following table:
| Limit Type | Capacity (Per Pack) | Recharge Rate | Primary Purpose |
|---|---|---|---|
| 5-Hour Requests | 500 requests (weighted by output token cost) | 5% every 3 minutes | Prevents hardware/GPU overload |
| Weekly Tokens | $24.00 worth of compute (uses API prices + 80% cache-read discount) | 2% every 3 hours | Ensures long-term sustainability |
| Concurrency | 1 at a time (2 for “small models: Nemotron 3 Super & GLM 4.7 Flash”) | N/A | More simultaneous requests |
| Price | $30.00 / month | N/A | Subsidizes “power users” |
Founder’s packs: $36/week tokens and 750 requests/5 hours instead of the standard values above.
These limits deplete and recharge like a stamina or mana bar in a video game, instead of emptying out completely and only refreshing at the end of the time limit, to avoid completely locking you out. The percentage each recharges by is calibrated to let a steady, not-too-small trickle of work continue to get done just based on the recharge increment if you’re truly desperate.
Even if you use your weekly limit up in the middle of the day, if you wait for 3 hours, you’ll have $0.72 of tokens, assuming you have a single pack. That might not sound like much, but it equals 500k 3:1 mixed input and output tokens.
The purpose of each of these bars is different:
Synthetic overhauled rate limits in mid-April 2026 after a 3-week opt-in experiment. Key changes:
Cache hits and misses are now reported in API responses using the standard OpenAI and Anthropic response formats. This lets you see exactly how many of your input tokens were served from cache (and thus discounted 80%).
The 80% cache-read discount on the weekly token quota is subscription-only for now. Synthetic has stated they plan to roll cache discounting out to PAYG in the future, but currently PAYG users pay full price for all tokens, including cache hits.
See prompt_caching for details on how the cache works and why hit rates vary.
Synthetic scales the “cost” of a request according to how expensive the requested model is based on the PAYG pricing. A model costs exactly 1.0 requests if it has a token output price of $3.40/million. MiniMax M2.5, for example, costs 0.59 requests ($2.00/$3.40) because it’s cheap to run. This makes Kimi K2.5 cost 1.0 requests, and most other models cost less.
Since you’re paying $30/mo for $24/wk worth of tokens, if you fully used your subscription to the maximum every week, you would already be very unprofitable for Synthetic.
However, very few people actually do that, which means that those who use enough less than the value of their subscription that they’re net-profitable for Synthetic subsidize the very few users who overuse their subscription.
Additionally, those most expensive users — unless they’re using some kind of automation like OpenClaw, which the hourly limit prevents — are rarely able to actually consistently overuse their subscription, simply because humans don’t have that kind of stamina and consistent work schedule. Therefore, the weeks or months where you underutilize your subscription also subsidize the weeks or months where you overutilize it!
The benefit for everyone, of course, even those who underutilize their subscription, being the chance of overutilization when needed, and consistent expected pricing.
As to whether they are currently profitable, as of April 8th, 2026, one of the paid interns on the Discord had this to say:
“[…] what i can say is: we’re margin-profitable on our GPUs finally [this includes subscriptions and API costs], and with ~2x growth we think we’d probably just be profitable overall (i.e. covering existing salaries, office space, AWS bill).”
Synthetic also provide unlimited access to Nomic Embed Text 1.5, an embedding model, and two LORAs they trained for octofriend, hf:syntheticlab/diff-apply and hf:syntheticlab/fix-json. Like any model from Synthetic, they’re usable in any harness. diff-apply applies find-replace style diffs to code and fix-json fixes broken JSON tool calls. They require particular input/output formats, so be sure to check the model cards.
Based on public statements from Synthetic staff. The rate limiting system has gone through several iterations due to abuse vectors.
| Phase | System | Problem |
|---|---|---|
| v1 | X requests per 5 hours + free tool calls | Users formatted any request as a tool call to get free requests. 3 users consumed >1/3 of total capacity. |
| v2 | Tool calls count as percentage of requests (e.g. 10%) | Percentage-based discount could still be abused for ~10x the quota. |
| v3 (current) | Weekly token quota ($24/week per pack) + 500 requests per 5 hours | Token-based weekly limit eliminates tool call abuse. Requests weighted by output token cost. |
The rate-limit-v3 experiment launched on April 7, 2026 after three weeks of opt-in testing. Key changes:
The weekly token quota means you don’t need to think about “saving” tool calls vs regular requests anymore. Everything is just tokens. Unused 5-hour requests may eventually get “refunded” into the weekly token quota (proposed but not yet confirmed).
Synthetic chose request-based limits over pure token-based limits for simplicity: