Table of Contents
Frequently Asked Questions
When will Synthetic get model X?
Before you ask this, make sure it:
- Is open weight or open source.
- Has weights available on HuggingFace (these are not always the same thing: sometimes a model is planned to be open weight, but has not been released yet).
- Has a compatible license that allows Synthetic to actually make money hosting a model (sometimes model weights are published “openly,” but only under modified OSS licenses that require royalties over a certain profitability limit, for instance).
- Has an NVFP4 quantization available so that Synthetic can run it on their GPUs at optimal speed (there are some exceptions to this — if a model is sufficiently desired, they may make their own quant).
- Has solid support for that model or its general architecture in sglang, the inference engine Synthetic uses to actually run the models.
Factors that can delay Synthetic getting a model:
- If it is unusually large, it may take time for them to acquire or free up GPU space to host it.
- If it has a novel or unusual architecture (such as DeepSeek Sparse Attention for GLM 5), it will take time for inference engines like sglang to get reliable support for the model.
- If the model has not yet been quantized to NVFP4, Synthetic will have to wait for NVIDIA to do that, or make one themselves, both of which can take some time.
Additionally, it is worth keeping in mind that many models from labs known for creating open-weight models may either be closed source—such as Qwen 3.6-Plus—or available only through the lab’s API for user testing and feedback (and to give the lab a profitable head start) but not yet released as open weights. This was the case with GLM 5.1 for a few weeks and remains true as of April 9th, 2026, for MiniMax M2.7.
When will Synthetic get Minimax M2.7?
M2.7 open weights became available on April 11. However, they’re only available under a non-commercial license. Synthetic cannot host M2.7 unless MiniMax agrees to a business partnership.
Why am I burning through my credits or requests so quickly?
Step 1: Check Your Tools
The software you use with Synthetic has a massive impact on your token “burn rate.”
1. Are you using Claude Code?
Recommendation: Use any other agent. We strongly recommend against using Claude Code as a coding harness with Synthetic. Its underlying infrastructure is notoriously inefficient, causing excessive token bloat. Using Claude Code will drain your credits much faster than almost any other option.
2. Are you using OpenCode (with oh-my-opencode/oh-my-openagent)?
While OpenCode is significantly better than Claude Code, it is still not optimized for efficiency. If you are using the oh-my-opencode/openagent (OMO) extension, the problem is significantly worse: OMO launches unnecessary, poorly designed subagent workflows that bloat every single prompt with redundant context, leading to a “death by a thousand tokens” scenario.
3. Are you using Zed?
Zed is a powerful editor, but its real-time “live-edit” feature comes at a cost. Zed utilizes a two-step process for edits:
- Intent to Edit: The main chat (the one you’re talking to) sends a tool call declaring that it wants to edit a file and defining the goal of the edit.
- Execution: Zed recieves that tool call and runs a separate request with the same chat history to generate the code in a streaming manner (for the live-diff to work).
Because of this “intent to edit” system, you are essentially using 2x the input tokens and 2x the requests for every single edit. If this workflow is necessary for you, you may need more packs to sustain it.
4. Are you using OpenClaw?
OpenClaw (and similar harnesses) have prompts set on cron jobs, as well as regularly scheduled heartbeats, that can cause them to use a lot of tokens behind your back, without you doing anything. Additionally, these tasks are often relatively long horizon, involving a lot of MCPs and tool calls, and the OpenClaw system prompt is not exactly small, so these impacts can be variably-big.
Step 2: Optimize Your Workflow
If your tools aren’t known for being wasteful but your usage remains high, follow these steps (roughly in order of recommendation) to reduce token bloat:
1. Reduce Frequency
Reduce the frequency of automated workflow runs. For example, if you use OpenClaw, review your current tasks in OpenClaw’s “heartbeat” function and increase the interval between checks.
2. Prompt Efficiency
Refine your input tokens. Use concise system prompts and AGENTS.md files.
3. Model Efficiency
Switch to a cheaper model for simpler tasks; Kimi and GLM don’t need to be running for every single prompt.
4. Serial Model Orchestration
Let your top level chat be with a more expensive model like Kimi or GLM, but have it orchestrate subagents in series (not in parallel) using cheaper models like MiniMax or even Nemotron to execute scoped, specified-in-detail tasks such as editing code or learning about the codebase.
This allows you to get the superior planning, problem solving, prompt and project understanding, and code review capabilities of the better model, but avoid burning tokens having the bigger model grep around your codebase or iterate on a piece of code to satisfy the compiler/linter/test suite, while still having access to the bigger model for problems the smaller ones can’t handle.
5. Limit "Thinking"
For less complex tasks, reduce the model’s budget for thinking. This forces the model to be more direct and prevents it from using tokens on unnecessary internal reasoning.
6. Increase Packs
If your workflow is already lean but you still hit limits, it may be time to upgrade your number of packs to match your professional output.
Why is model X from Synthetic not in $preferred_harness's list?
Many of these lists are updated by hand by a human, so you might be the first $preferred_harness user who’s noticed one’s missing! You can either
- Wait until someone else opens a PR to
$preferred_harness‘s list - Find where
$preferred_harnesssources its data from and open that PR yourself- OpenCode has Models.dev
- Crush has Catwalk
- Use some kind of provider-specific plugin for
$preferred_harnesswith a list that updates more frequently - Maintain your own provider/model list for
$preferred_harness