This is an old revision of the document!

Frequently Asked Questions

Frequently Asked Questions

When will Synthetic get model X?

Before you ask this, make sure it:

Is open weight or open source.
Has weights available on HuggingFace (these are not always the same thing: sometimes a model is planned to be open weight, but has not been released yet).
Has a compatible license that allows Synthetic to actually make money hosting a model (sometimes model weights are published “openly,” but only under modified OSS licenses that require royalties over a certain profitability limit, for instance).
Has an NVFP4 quantization available so that Synthetic can run it on their GPUs at optimal speed (there are some exceptions to this — if a model is sufficiently desired, they may make their own quant).
Has solid support for that model or its general architecture in sglang, the inference engine Synthetic uses to actually run the models.

Factors that can delay Synthetic getting a model:

If it is unusually large, it may take time for them to acquire or free up GPU space to host it.

If it has a novel or unusual architecture (such as DeepSeek Sparse Attention for GLM 5), it will take time for inference engines like sglang to get reliable support for the model.

If the model has not yet been quantized to NVFP4, Synthetic will have to wait for NVIDIA to do that, or make one themselves, both of which can take some time.

Additionally, it is worth keeping in mind that many models from labs known for creating open-weight models may either be closed source—such as Qwen 3.6-Plus—or available only through the lab’s API for user testing and feedback (and to give the lab a profitable head start) but not yet released as open weights. This was the case with GLM 5.1 for a few weeks and remains true as of April 9th, 2026, for MiniMax M2.7.

Why is model X from Synthetic not in $preferred_harness's list?

Many of these lists are updated by hand by a human, so you might be the first $preferred_harness user who’s noticed one’s missing! You can either

Wait until someone else opens a PR to $preferred_harness‘s list
Find where $preferred_harness sources its data from and open that PR yourself
- OpenCode has Models.dev
- Crush has Catwalk
Use some kind of provider-specific plugin for $preferred_harness with a list that updates more frequently
- Pi has @benvargas/pi-synthetic-provider
Maintain your own provider/model list for $preferred_harness
- OpenCode
- Pi
- Crush

Why am I burning through my credits or requests so quickly?

Step 1: Check Your Tools

The software you use with Synthetic has a massive impact on your token “burn rate.”

1. Are you using Claude Code?

Recommendation: Use any other agent. We strongly recommend against using Claude Code as a coding harness with Synthetic. Its underlying infrastructure is notoriously inefficient, causing excessive token bloat. Using Claude Code will drain your credits much faster than almost any other option.

2. Are you using OpenCode (with oh-my-opencode/oh-my-openagent)?

While OpenCode is significantly better than Claude Code, it is still not optimized for efficiency. If you are using the oh-my-opencode/openagent (OMO) extension, the problem is significantly worse: OMO launches unnecessary, poorly designed subagent workflows that bloat every single prompt with redundant context, leading to a “death by a thousand tokens” scenario.

3. Are you using Zed?

Zed is a powerful editor, but its real-time “live-edit” feature comes at a cost. Zed utilizes a two-step process for edits:

Intent to Edit: The main chat (the one you’re talking to) sends a tool call declaring that it wants to edit a file and defining the goal of the edit.
Execution: Zed recieves that tool call and runs a separate request with the same chat history to generate the code in a streaming manner (for the live-diff to work).

Because of this “intent to edit” system, you are essentially using 2x the input tokens and 2x the requests for every single edit. If this workflow is necessary for you, you may need more packs to sustain it.

Step 2: Optimize Your Workflow

If your tools aren’t known for being wasteful but your usage remains high, follow these steps (roughly in order of recommendation) to reduce token bloat:

1. Reduce Frequency

Reduce the frequency of automated workflow runs. For example, if you use OpenClaw, review your current tasks in OpenClaw’s “heartbeat” function and increase the interval between checks.

2. Prompt Efficiency

Refine your input tokens. Use concise system prompts and AGENTS.md files.

3. Model Tiering

Switch to a cheaper model for simpler tasks; Kimi and GLM don’t need to be running for every single prompt.

4. Limit "Thinking"

For less complex tasks, reduce the model’s budget for thinking. This forces the model to be more direct and prevents it from using tokens on unnecessary internal reasoning.

5. Increase Packs

If your workflow is already lean but you still hit limits, it may be time to upgrade your number of packs to match your professional output.

Table of Contents