Before you ask this, make sure it:
Factors that can delay Synthetic getting a model:
Additionally, it is worth keeping in mind that many models from labs known for creating open-weight models may either be closed source—such as Qwen 3.6-Plus—or available only through the lab’s API for user testing and feedback (and to give the lab a profitable head start) but not yet released as open weights. This was the case with GLM 5.1 for a few weeks and remains true as of April 9th, 2026, for MiniMax M2.7.
M2.7 open weights became available on April 11. However, they’re only available under a non-commercial license. Synthetic cannot host M2.7 unless MiniMax agrees to a business partnership.
The software you use with Synthetic has a massive impact on your token “burn rate.”
Recommendation: Use any other agent. We strongly recommend against using Claude Code as a coding harness with Synthetic. Its underlying infrastructure is notoriously inefficient, causing excessive token bloat. Using Claude Code will drain your credits much faster than almost any other option.
While OpenCode is significantly better than Claude Code, it is still not optimized for efficiency. If you are using the oh-my-opencode/openagent (OMO) extension, the problem is significantly worse: OMO launches unnecessary, poorly designed subagent workflows that bloat every single prompt with redundant context, leading to a “death by a thousand tokens” scenario.
Zed is a powerful editor, but its real-time “live-edit” feature comes at a cost. Zed utilizes a two-step process for edits:
Because of this “intent to edit” system, you are essentially using 2x the input tokens and 2x the requests for every single edit. If this workflow is necessary for you, you may need more packs to sustain it.
OpenClaw (and similar harnesses) have prompts set on cron jobs, as well as regularly scheduled heartbeats, that can cause them to use a lot of tokens behind your back, without you doing anything. Additionally, these tasks are often relatively long horizon, involving a lot of MCPs and tool calls, and the OpenClaw system prompt is not exactly small, so these impacts can be variably-big.
If your tools aren’t known for being wasteful but your usage remains high, follow these steps (roughly in order of recommendation) to reduce token bloat:
Reduce the frequency of automated workflow runs. For example, if you use OpenClaw, review your current tasks in OpenClaw’s “heartbeat” function and increase the interval between checks.
Refine your input tokens. Use concise system prompts and AGENTS.md files.
Switch to a cheaper model for simpler tasks; Kimi and GLM don’t need to be running for every single prompt.
Let your top level chat be with a more expensive model like Kimi or GLM, but have it orchestrate subagents in series (not in parallel) using cheaper models like MiniMax or even Nemotron to execute scoped, specified-in-detail tasks such as editing code or learning about the codebase.
This allows you to get the superior planning, problem solving, prompt and project understanding, and code review capabilities of the better model, but avoid burning tokens having the bigger model grep around your codebase or iterate on a piece of code to satisfy the compiler/linter/test suite, while still having access to the bigger model for problems the smaller ones can’t handle.
For less complex tasks, reduce the model’s budget for thinking. This forces the model to be more direct and prevents it from using tokens on unnecessary internal reasoning.
If your workflow is already lean but you still hit limits, it may be time to upgrade your number of packs to match your professional output.
Many of these lists are updated by hand by a human, so you might be the first $preferred_harness user who’s noticed one’s missing! You can either
$preferred_harness‘s list$preferred_harness sources its data from and open that PR yourself$preferred_harness with a list that updates more frequently$preferred_harness