Organic

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
faq [2026/04/11 05:29] katfaq [2026/04/11 16:19] (current) – beeg brane here amolith
Line 1: Line 1:
 ====== Frequently Asked Questions ====== ====== Frequently Asked Questions ======
  
-===== When will Synthetic get model X? ===== +{{namespace>faq}}
- +
-Before you ask this, make sure it: +
- +
-  - Is [[https://opensource.org/ai/open-weights|open weight]] or [[https://opensource.org/ai/open-source-ai-definition|open source]]. +
-  - Has weights available on [[https://huggingface.co/models|HuggingFace]] (these are not always the same thing: sometimes a model is //planned// to be open weight, but has not been released yet). +
-  - Has a compatible license that allows Synthetic to actually make money hosting a model (sometimes model weights are published "openly," but only under modified OSS licenses that require royalties over a certain profitability limit, for instance). +
-  - Has an [[https://developer.nvidia.com/blog/introducing-nvfp4-for-efficient-and-accurate-low-precision-inference|NVFP4]] [[https://ngrok.com/blog/quantization|quantization]] available so that Synthetic can run it on their GPUs at optimal speed (there are some exceptions to this — if a model is sufficiently desired, they may make their own quant). +
-  - Has solid support for that model or its general architecture in [[https://github.com/sgl-project/sglang/issues|sglang]], the inference engine Synthetic uses to actually //run// the models. +
- +
-Factors that can delay Synthetic getting a model: +
- +
-  * If it is unusually large, it may take time for them to acquire or free up GPU space to host it. +
- +
-  * If it has a novel or unusual architecture (such as DeepSeek Sparse Attention for GLM 5), it will take time for inference engines like sglang to get reliable support for the model. +
- +
-  * If the model has not yet been quantized to NVFP4, Synthetic will have to wait for NVIDIA to do that, or make one themselves, both of which can take some time. +
- +
-Additionally, it is worth keeping in mind that many models from labs known for creating open-weight models may either be closed source—such as Qwen 3.6-Plus—or available only through the lab's API for user testing and feedback (and to give the lab a profitable head start) but not yet released as open weights. This was the case with GLM 5.1 for a few weeks and remains true as of April 9th, 2026, for MiniMax M2.7. +
- +
-===== Why is model X from Synthetic not in $preferred_harness's list? ===== +
- +
-Many of these lists are updated by hand by a human, so you might be the first ''$preferred_harness'' user who's noticed one's missing! You can either +
- +
-  * Wait until someone else opens a PR to ''$preferred_harness'''s list +
-  * Find where ''$preferred_harness'' sources its data from and open that PR yourself +
-    * OpenCode has [[https://github.com/anomalyco/models.dev/|Models.dev]] +
-    * Crush has [[https://github.com/charmbracelet/catwalk|Catwalk]] +
-  * Use some kind of provider-specific plugin for ''$preferred_harness'' with a list that updates more frequently +
-    * Pi has [[https://github.com/ben-vargas/pi-packages/tree/main/packages/pi-synthetic-provider#readme|@benvargas/pi-synthetic-provider]] +
-  * Maintain your own provider/model list for ''$preferred_harness'' +
-    * [[https://opencode.ai/docs/providers#custom-provider|OpenCode]] +
-    * [[https://github.com/badlogic/pi-mono/blob/main/packages/coding-agent/docs/models.md|Pi]] +
-    * [[https://github.com/charmbracelet/crush?tab=readme-ov-file#openai-compatible-apis|Crush]] +
- +
-===== Why am I burning through my credits or requests so quickly? ===== +
- +
-==== Step 1: Check Your Tools ==== +
- +
-The software you use with Synthetic has a massive impact on your token "burn rate." +
- +
-=== 1. Are you using Claude Code? === +
- +
-**Recommendation: Use any other agent.** We strongly recommend //against// using Claude Code as a coding harness with Synthetic. Its underlying infrastructure is notoriously inefficient, causing excessive token bloat. Using Claude Code will drain your credits much faster than almost any other option. +
- +
-=== 2. Are you using OpenCode (with oh-my-opencode/oh-my-openagent)? === +
- +
-While OpenCode is significantly better than Claude Code, it is still not optimized for efficiency. If you are using the oh-my-opencode/openagent (OMO) extension, the problem is significantly worse: OMO launches unnecessary, poorly designed subagent workflows that bloat every single prompt with redundant context, leading to a "death by a thousand tokens" scenario. +
- +
-=== 3. Are you using Zed? === +
- +
-Zed is a powerful editor, but its real-time "live-edit" feature comes at a cost. Zed utilizes a two-step process for edits: +
- +
-  - **Intent to Edit:** It sends a request to define the goal. +
-  - **Execution:** It runs a //separate// request with the same chat history to generate the code and live-diff. +
- +
-Because of this "intent to edit" system, you are essentially using **2x the input tokens** and **2x the requests** for every single edit. If this workflow is necessary for you, you may need more packs to sustain it. +
- +
-==== Step 2: Optimize Your Workflow ==== +
- +
-If your tools aren't known for being wasteful but your usage remains high, follow these steps (roughly in order of recommendation) to reduce token bloat: +
- +
-=== 1. Reduce Frequency === +
- +
-Reduce the frequency of automated workflow runs. For example, if you use OpenClaw, review your current tasks in OpenClaw's "heartbeat" function and increase the interval between checks. +
- +
-=== 2. Prompt Efficiency === +
- +
-Refine your input tokens. Use concise system prompts and AGENTS.md files. +
- +
-=== 3. Model Tiering === +
- +
-Switch to a cheaper model for simpler tasks; Kimi and GLM don't need to be running for every single prompt. +
- +
-=== 4. Limit "Thinking" === +
- +
-For less complex tasks, reduce the model's budget for thinking. This forces the model to be more direct and prevents it from using tokens on unnecessary internal reasoning. +
- +
-=== 5. Increase Packs === +
- +
-If your workflow is already lean but you still hit limits, it may be time to upgrade your number of packs to match your professional output.+