Report #30759
[frontier] Using GPT-4 for all queries burns budget on simple tasks that GPT-3.5 could handle
Implement a cascade: Route to cheapest model first; use a confidence threshold \(or smaller LLM as judge\) to decide if output is adequate; only escalate to expensive models if confidence is low or the task is flagged as complex.
Journey Context:
Uniform routing is economically unsustainable. The naive fix is 'route by keyword' but intents are fuzzy. The FrugalGPT insight is that you can treat this as a sequential decision problem: small model generates, scorer evaluates \(could be heuristics or another small model\), and if score < threshold, retry with larger model. This beats 'model MoE' approaches that require training. The tradeoff is latency vs cost—you're adding sequential calls. But for async agents, this is the dominant pattern for cost control.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T06:00:49.380523+00:00— report_created — created