Differences

This shows you the differences between two versions of the page.

--- faq [2026/04/11 05:29] – kat
+++ faq [2026/04/11 16:19] (current) – beeg brane here amolith
@@ Line 1: / Line 1: @@
 ====== Frequently Asked Questions ======
-===== When will Synthetic get model X? =====
+{{namespace>faq}}
-Before you ask this, make sure it:
-  - Is [[https://opensource.org/ai/open-weights|open weight]] or [[https://opensource.org/ai/open-source-ai-definition|open source]].
-  - Has weights available on [[https://huggingface.co/models|HuggingFace]] (these are not always the same thing: sometimes a model is //planned// to be open weight, but has not been released yet).
-  - Has a compatible license that allows Synthetic to actually make money hosting a model (sometimes model weights are published "openly," but only under modified OSS licenses that require royalties over a certain profitability limit, for instance).
-  - Has an [[https://developer.nvidia.com/blog/introducing-nvfp4-for-efficient-and-accurate-low-precision-inference|NVFP4]] [[https://ngrok.com/blog/quantization|quantization]] available so that Synthetic can run it on their GPUs at optimal speed (there are some exceptions to this — if a model is sufficiently desired, they may make their own quant).
-  - Has solid support for that model or its general architecture in [[https://github.com/sgl-project/sglang/issues|sglang]], the inference engine Synthetic uses to actually //run// the models.
-Factors that can delay Synthetic getting a model:
-  * If it is unusually large, it may take time for them to acquire or free up GPU space to host it.
-  * If it has a novel or unusual architecture (such as DeepSeek Sparse Attention for GLM 5), it will take time for inference engines like sglang to get reliable support for the model.
-  * If the model has not yet been quantized to NVFP4, Synthetic will have to wait for NVIDIA to do that, or make one themselves, both of which can take some time.
-Additionally, it is worth keeping in mind that many models from labs known for creating open-weight models may either be closed source—such as Qwen 3.6-Plus—or available only through the lab's API for user testing and feedback (and to give the lab a profitable head start) but not yet released as open weights. This was the case with GLM 5.1 for a few weeks and remains true as of April 9th, 2026, for MiniMax M2.7.
-===== Why is model X from Synthetic not in $preferred_harness's list? =====
-Many of these lists are updated by hand by a human, so you might be the first ''$preferred_harness'' user who's noticed one's missing! You can either
-  * Wait until someone else opens a PR to ''$preferred_harness'''s list
-  * Find where ''$preferred_harness'' sources its data from and open that PR yourself
-    * OpenCode has [[https://github.com/anomalyco/models.dev/|Models.dev]]
-    * Crush has [[https://github.com/charmbracelet/catwalk|Catwalk]]
-  * Use some kind of provider-specific plugin for ''$preferred_harness'' with a list that updates more frequently
-    * Pi has [[https://github.com/ben-vargas/pi-packages/tree/main/packages/pi-synthetic-provider#readme|@benvargas/pi-synthetic-provider]]
-  * Maintain your own provider/model list for ''$preferred_harness''
-    * [[https://opencode.ai/docs/providers#custom-provider|OpenCode]]
-    * [[https://github.com/badlogic/pi-mono/blob/main/packages/coding-agent/docs/models.md|Pi]]
-    * [[https://github.com/charmbracelet/crush?tab=readme-ov-file#openai-compatible-apis|Crush]]
-===== Why am I burning through my credits or requests so quickly? =====
-==== Step 1: Check Your Tools ====
-The software you use with Synthetic has a massive impact on your token "burn rate."
-=== 1. Are you using Claude Code? ===
-**Recommendation: Use any other agent.** We strongly recommend //against// using Claude Code as a coding harness with Synthetic. Its underlying infrastructure is notoriously inefficient, causing excessive token bloat. Using Claude Code will drain your credits much faster than almost any other option.
-=== 2. Are you using OpenCode (with oh-my-opencode/oh-my-openagent)? ===
-While OpenCode is significantly better than Claude Code, it is still not optimized for efficiency. If you are using the oh-my-opencode/openagent (OMO) extension, the problem is significantly worse: OMO launches unnecessary, poorly designed subagent workflows that bloat every single prompt with redundant context, leading to a "death by a thousand tokens" scenario.
-=== 3. Are you using Zed? ===
-Zed is a powerful editor, but its real-time "live-edit" feature comes at a cost. Zed utilizes a two-step process for edits:
-  - **Intent to Edit:** It sends a request to define the goal.
-  - **Execution:** It runs a //separate// request with the same chat history to generate the code and live-diff.
-Because of this "intent to edit" system, you are essentially using **2x the input tokens** and **2x the requests** for every single edit. If this workflow is necessary for you, you may need more packs to sustain it.
-==== Step 2: Optimize Your Workflow ====
-If your tools aren't known for being wasteful but your usage remains high, follow these steps (roughly in order of recommendation) to reduce token bloat:
-=== 1. Reduce Frequency ===
-Reduce the frequency of automated workflow runs. For example, if you use OpenClaw, review your current tasks in OpenClaw's "heartbeat" function and increase the interval between checks.
-=== 2. Prompt Efficiency ===
-Refine your input tokens. Use concise system prompts and AGENTS.md files.
-=== 3. Model Tiering ===
-Switch to a cheaper model for simpler tasks; Kimi and GLM don't need to be running for every single prompt.
-=== 4. Limit "Thinking" ===
-For less complex tasks, reduce the model's budget for thinking. This forces the model to be more direct and prevents it from using tokens on unnecessary internal reasoning.
-=== 5. Increase Packs ===
-If your workflow is already lean but you still hit limits, it may be time to upgrade your number of packs to match your professional output.