limits

Synthetic Limits and Pricing
Additional Limits
Rate-Limit Evolution

Synthetic Limits and Pricing

Subscription Pricing

The Synthetic subscription — not to be confused with usage-based pricing — works in terms of “packs”.

A base subscription is $30/mo, and corresponds to one “pack.”

This subscription provides a fixed set of 5-hour requests, weekly token limits, and concurrency.

If you add a pack, you’re essentially adding a second $30/mo subscription: you get double the total usage for both hourly and weekly limits, double the recharge rate (since recharge is a percentage), double the concurrency, and double the price. Add another pack, and it’s triple the original price, limits, recharge rate, and concurrency.

You can do this for up to 5 packs in total.

Founder's Packs

Founder’s packs are worth 50% more than standard packs — $36/week in tokens and 750 requests per 5 hours (vs $24/week and 500 for standard). Same $30/mo price.

Subscription Limits

Your subscription has two limits, determined by how many packs you have. Both limits recharge by a fixed percentage of their total (determined by how many packs you have) over a fixed time interval, calculated to ensure that each bar would refill from empty within the time-frame represented. See the following table:

Limit Type	Capacity (Per Pack)	Recharge Rate	Primary Purpose
5-Hour Requests	500 requests (weighted by output token cost)	5% every 3 minutes	Prevents hardware/GPU overload
Weekly Tokens	$24.00 worth of compute (uses API prices + 80% cache-read discount)	2% every 3 hours	Ensures long-term sustainability
Concurrency	1 at a time (2 for “small models: Nemotron 3 Super & GLM 4.7 Flash”)	N/A	More simultaneous requests
Price	$30.00 / month	N/A	Subsidizes “power users”

Founder’s packs: $36/week tokens and 750 requests/5 hours instead of the standard values above.

These limits deplete and recharge like a stamina or mana bar in a video game, instead of emptying out completely and only refreshing at the end of the time limit, to avoid completely locking you out. The percentage each recharges by is calibrated to let a steady, not-too-small trickle of work continue to get done just based on the recharge increment if you’re truly desperate.

Even if you use your weekly limit up in the middle of the day, if you wait for 3 hours, you’ll have $0.72 of tokens, assuming you have a single pack. That might not sound like much, but it equals 500k 3:1 mixed input and output tokens.

The purpose of each of these bars is different:

The 5-hour requests bar is designed to keep burst usage from getting too high, so that Synthetic’s GPUs don’t get overloaded and slow down Tokens Per Second and Time To First Token for everyone.

The weekly limit is designed to ensure that nobody is able to use so much compute — which costs money, via electricity prices — that their subscription becomes too unprofitable for Synthetic.

History: Rate Limit Changes

Synthetic overhauled rate limits in mid-April 2026 after a 3-week opt-in experiment. Key changes:

Replaced daily tool call quota with weekly token-based quota — the old system counted tool calls separately (500/day), which meant you could exhaust your tool calls while still having regular request budget unused. Token-based counting is more flexible and fairer: less load = less limiting, more load = more limiting.
5-hour request limit increased from 135 to 500 per pack — previously max ~1,148 requests/day per pack (135 × 24/5 + 500 tool calls). Now 2,400 requests/day per pack. Founder’s packs: 750 per 5 hours = 3,600/day.
Weekly token quota introduced at $24/week per pack — guaranteed to always be better value than PAYG API pricing. Previously, some usage patterns were actually cheaper on PAYG than subscription.
Continuous regeneration — hitting your weekly quota doesn’t lock you out for a week. Wait a day and one day’s worth regenerates automatically.

Cache Hit Reporting

Cache hits and misses are now reported in API responses using the standard OpenAI and Anthropic response formats. This lets you see exactly how many of your input tokens were served from cache (and thus discounted 80%).

The 80% cache-read discount on the weekly token quota is subscription-only for now. Synthetic has stated they plan to roll cache discounting out to PAYG in the future, but currently PAYG users pay full price for all tokens, including cache hits.

See prompt_caching for details on how the cache works and why hit rates vary.

Request cost scaling

Synthetic scales the “cost” of a request according to how expensive the requested model is based on the PAYG pricing. A model costs exactly 1.0 requests if it has a token output price of $3.40/million. MiniMax M2.5, for example, costs 0.59 requests ($2.00/$3.40) because it’s cheap to run. This makes Kimi K2.5 cost 1.0 requests, and most other models cost less.

Profitability

Since you’re paying $30/mo for $24/wk worth of tokens, if you fully used your subscription to the maximum every week, you would already be very unprofitable for Synthetic.

However, very few people actually do that, which means that those who use enough less than the value of their subscription that they’re net-profitable for Synthetic subsidize the very few users who overuse their subscription.

Additionally, those most expensive users — unless they’re using some kind of automation like OpenClaw, which the hourly limit prevents — are rarely able to actually consistently overuse their subscription, simply because humans don’t have that kind of stamina and consistent work schedule. Therefore, the weeks or months where you underutilize your subscription also subsidize the weeks or months where you overutilize it!

The benefit for everyone, of course, even those who underutilize their subscription, being the chance of overutilization when needed, and consistent expected pricing.

As to whether they are currently profitable, as of April 8th, 2026, one of the paid interns on the Discord had this to say:

“[…] what i can say is: we’re margin-profitable on our GPUs finally [this includes subscriptions and API costs], and with ~2x growth we think we’d probably just be profitable overall (i.e. covering existing salaries, office space, AWS bill).”

LLM Inference Unit Economics Resources

LLM Inference Economics from First Principles (use this to understand what calculations to do — the underlying principles, equations, and factors in plan — not to understand the actual values in plan, or, as a result, the economics of inference in 2026, because nearly lossless quants like NVFP4 and much more powerful, higher-HBM GPUs like the B200 completely change the values at play which in turn completely changes the economics).

Are OpenAI and Anthropic Really Losing Money on Inference? (use this to get a ballpark idea of how much it costs to run models closer to the present-day, and how that compares to subscriptions).

LLMs Are Cheap (use this to get a sense of the market for open weight models).

Introducing NVFP4 for Efficient and Accurate Low-Precision Inference

Quantization from the ground up

Unpacking the deceptively simple science of tokenomics

AI inference costs dropped up to 10x on Nvidia's Blackwell — but hardware is only half the equation

Additional Limits

Synthetic also provide unlimited access to Nomic Embed Text 1.5, an embedding model, and two LORAs they trained for octofriend, hf:syntheticlab/diff-apply and hf:syntheticlab/fix-json. Like any model from Synthetic, they’re usable in any harness. diff-apply applies find-replace style diffs to code and fix-json fixes broken JSON tool calls. They require particular input/output formats, so be sure to check the model cards.

Rate-Limit Evolution

Based on public statements from Synthetic staff. The rate limiting system has gone through several iterations due to abuse vectors.

Phase	System	Problem
v1	X requests per 5 hours + free tool calls	Users formatted any request as a tool call to get free requests. 3 users consumed >1/3 of total capacity.
v2	Tool calls count as percentage of requests (e.g. 10%)	Percentage-based discount could still be abused for ~10x the quota.
v3 (current)	Weekly token quota ($24/week per pack) + 500 requests per 5 hours	Token-based weekly limit eliminates tool call abuse. Requests weighted by output token cost.

The rate-limit-v3 experiment launched on April 7, 2026 after three weeks of opt-in testing. Key changes:

5-hour requests: 500 per pack (up from 135), weighted by output token cost
Weekly tokens: $24.00 worth of compute per pack (replaces daily tool call limits)
Tool calls: No longer separately counted — all usage flows through the weekly token quota
Founder’s packs: 50% more ($36/week tokens, 750 requests/5 hours)
Concurrency: 1 at a time per pack (2 for “small models”: Nemotron 3 Super & GLM-4.7-Flash)

The weekly token quota means you don’t need to think about “saving” tool calls vs regular requests anymore. Everything is just tokens. Unused 5-hour requests may eventually get “refunded” into the weekly token quota (proposed but not yet confirmed).

Why Request-Based, Not Token-Based?

Synthetic chose request-based limits over pure token-based limits for simplicity:

Token-based pricing encourages gaming (deleting conversation history to save quota, splitting context)
Request count follows a predictable pattern relative to cost
With the weekly token quota, the worst of both approaches is mitigated

Table of Contents