This is an old revision of the document!
Table of Contents
Synthetic Subscription Limits and Pricing
Subscription Pricing
The Synthetic subscription — not to be confused with usage-based pricing — works in terms of “packs”.
A base subscription is $30/mo, and corresponds to one “pack.”
This subscription provides a fix set of per-hour requests, weekly input and output token limits, and concurrency.
If you add a pack, you’re essentially adding a second $30/mo subscription: you get double the total usage for both hourly and weekly limits, double the recharge rate (since recharge is a percentage) and double the price. Add another pack, and it’s triple the original price, limits, and recharge rate.
You can do this for up to 5 packs in total.
Subscription Limits
Your subscription has two limits, determined by how many packs you have. Both limits recharge by a fixed percentage of their total (determined by how many packs you have) over a fixed time interval, calculated to ensure that each bar would refill from empty within the time-frame represented. See the following table:
| Limit Type | Capacity (Per Pack) | Recharge Rate | Primary Purpose |
|---|---|---|---|
| 5-Hour Requests | 500 requests (weighted by output token cost) | 5% every 3 minutes | Prevents hardware/GPU overload |
| Weekly Tokens | $24.00 worth of compute (uses API prices + 80% cache-read discount) | 2% every 3 hours | Ensures long-term sustainability |
| Price | $30.00 / month | N/A | Subsidizes “power users” |
These limits deplete and recharge like a stamina or mana bar in a video game, instead of emptying out completely and then only refreshing at the end of the time limit, to avoid completely locking you out. The percentage each recharges by is calibrated to let a steady, not-too-small trickle of work continue to get done just based on the recharge increment if you’re truly desperate.
Even if you use your weekly limit up in the middle of the day, if you wait for 3 hours, you’ll have $0.72 of tokens, assuming you have a single pack. That might not sound like much, but it equals 500k 3:1 mixed input and output tokens.
The purpose of each of these bars is different:
- The hourly requests bar is designed to keep burst usage from getting too high, so that Synthetic’s GPUs don’t get overloaded and slow down Tokens Per Second and Time To First Token for everyone.
- The weekly limit is designed to ensure that nobody is able to use so much compute — which costs money, via electricity prices — that their subscription becomes too unprofitable for Synthetic.
Profitability
Since you’re paying $30/mo for $24/wk worth of tokens, if you fully used your subscription to the maximum every week, you would already be very unprofitable for Synthetic.
However, very few people actually do that, which means that those who use enough less than the value of their subscription that they’re net-profitable for Synthetic subsidize the very few users who overuse their subscription.
Additionally, those most expensive users — unless they’re using some kind of automation like OpenClaw, which the hourly limit prevents — are rarely able to actually consistently overuse their subscription, simply because humans don’t have that kind of stamina and consistent work schedule. Therefore, the weeks or months where you underutilize your subscription also subsidize the weeks or months where you overutilize it!
The benefit for everyone, of course, even those who underutilize their subscription, being the chance of overutilization when needed, and consistent expected pricing.
As to whether they are currently profitable, as of April 8th, 2026, one of the paid interns on the Discord had this to say:
“[…] what i can say is: we’re margin-profitable on our GPUs finally [this includes subscriptions and API costs], and with ~2x growth we think we’d probably just be profitable overall (i.e. covering existing salaries, office space, AWS bill).”
LLM Inference Unit Economics Resources
- LLM Inference Economics from First Principles (use this to understand what calculations to do — the underlying principles, equations, and factors in plan — not to understand the actual values in plan, or, as a result, the economics of inference in 2026, because nearly lossless quants like NVFP4 and much more powerful, higher-HBM GPUs like the B200 completely change the values at play which in turn completely changes the economics).
- Are OpenAI and Anthropic Really Losing Money on Inference? (use this to get a ballpark idea of how much it costs to run models closer to the present-day, and how that compares to subscriptions).
- LLMs Are Cheap (use this to get a sense of the market for open weight models).