limits [Organic]

This is an old revision of the document!

Synthetic Limits and Pricing
Additional Limits

Synthetic Limits and Pricing

Subscription Pricing

The Synthetic subscription — not to be confused with usage-based pricing — works in terms of “packs”.

A base subscription is $30/mo, and corresponds to one “pack.”

This subscription provides a fix set of per-hour requests, weekly input and output token limits, and concurrency.

If you add a pack, you’re essentially adding a second $30/mo subscription: you get double the total usage for both hourly and weekly limits, double the recharge rate (since recharge is a percentage), double the concurrency, and double the price. Add another pack, and it’s triple the original price, limits, recharge rate, and concurrency.

You can do this for up to 5 packs in total.

Subscription Limits

Your subscription has two limits, determined by how many packs you have. Both limits recharge by a fixed percentage of their total (determined by how many packs you have) over a fixed time interval, calculated to ensure that each bar would refill from empty within the time-frame represented. See the following table:

Limit Type	Capacity (Per Pack)	Recharge Rate	Primary Purpose
5-Hour Requests	500 requests (weighted by output token cost)	5% every 3 minutes	Prevents hardware/GPU overload
Weekly Tokens	$24.00 worth of compute (uses API prices + 80% cache-read discount)	2% every 3 hours	Ensures long-term sustainability
Concurrency	1 at a time (2 for small models)	N/A	More simultaneous requests
Price	$30.00 / month	N/A	Subsidizes “power users”

These limits deplete and recharge like a stamina or mana bar in a video game, instead of emptying out completely and then only refreshing at the end of the time limit, to avoid completely locking you out. The percentage each recharges by is calibrated to let a steady, not-too-small trickle of work continue to get done just based on the recharge increment if you’re truly desperate.

Even if you use your weekly limit up in the middle of the day, if you wait for 3 hours, you’ll have $0.72 of tokens, assuming you have a single pack. That might not sound like much, but it equals 500k 3:1 mixed input and output tokens.

The purpose of each of these bars is different:

The hourly requests bar is designed to keep burst usage from getting too high, so that Synthetic’s GPUs don’t get overloaded and slow down Tokens Per Second and Time To First Token for everyone.

The weekly limit is designed to ensure that nobody is able to use so much compute — which costs money, via electricity prices — that their subscription becomes too unprofitable for Synthetic.

Profitability

Since you’re paying $30/mo for $24/wk worth of tokens, if you fully used your subscription to the maximum every week, you would already be very unprofitable for Synthetic.

However, very few people actually do that, which means that those who use enough less than the value of their subscription that they’re net-profitable for Synthetic subsidize the very few users who overuse their subscription.

Additionally, those most expensive users — unless they’re using some kind of automation like OpenClaw, which the hourly limit prevents — are rarely able to actually consistently overuse their subscription, simply because humans don’t have that kind of stamina and consistent work schedule. Therefore, the weeks or months where you underutilize your subscription also subsidize the weeks or months where you overutilize it!

The benefit for everyone, of course, even those who underutilize their subscription, being the chance of overutilization when needed, and consistent expected pricing.

As to whether they are currently profitable, as of April 8th, 2026, one of the paid interns on the Discord had this to say:

“[…] what i can say is: we’re margin-profitable on our GPUs finally [this includes subscriptions and API costs], and with ~2x growth we think we’d probably just be profitable overall (i.e. covering existing salaries, office space, AWS bill).”

LLM Inference Unit Economics Resources

LLM Inference Economics from First Principles (use this to understand what calculations to do — the underlying principles, equations, and factors in plan — not to understand the actual values in plan, or, as a result, the economics of inference in 2026, because nearly lossless quants like NVFP4 and much more powerful, higher-HBM GPUs like the B200 completely change the values at play which in turn completely changes the economics).

Are OpenAI and Anthropic Really Losing Money on Inference? (use this to get a ballpark idea of how much it costs to run models closer to the present-day, and how that compares to subscriptions).

LLMs Are Cheap (use this to get a sense of the market for open weight models).

Introducing NVFP4 for Efficient and Accurate Low-Precision Inference

Quantization from the ground up

Unpacking the deceptively simple science of tokenomics

AI inference costs dropped up to 10x on Nvidia's Blackwell — but hardware is only half the equation

Additional Limits

Synthetic also provide unlimited access to hf:nomic-ai/nomic-embed-text-v1.5, and embedding model, and two LORAs they trained for octofriend, hf:syntheticlab/diff-apply and hf:syntheticlab/json-json. Like any model from Synthetic, they’re usable in any harness. diff-apply applies find-replace style diffs to code and fix-json fixes broken JSON tool calls. They require particular input/output formats, so be sure to check the model cards.

Table of Contents