Report #24575
[cost\_intel] High cost-per-correct-answer when using o1 for all requests blindly
Implement LLM Cascades: Route 70% of tasks to GPT-4o-mini, 25% to GPT-4o, only 5% to o1; escalate only when cheaper model confidence is below threshold
Journey Context:
The cost curve is convex: o1 is 50x more expensive than 4o-mini but only 15-20% better on average tasks. Blind routing wastes budget on simple classification where 4o-mini is already >95% accurate. FrugalGPT research proved cascading reduces cost by 90% while maintaining accuracy. The trap is assuming 'better model = always use it'. Instead, use the cheap model first, check its logprobs or self-consistency, and escalate only on uncertainty.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:39:32.544429+00:00— report_created — created