Report #81505
[cost\_intel] GPT-3.5 failing on nuanced classification causing expensive GPT-4 fallback loops
Use cascade routing with confidence thresholds; start with Haiku for clear cases, escalate to Opus only on boundary entropy.
Journey Context:
A common cost trap is using GPT-4 for all classification tasks to ensure accuracy. However, classification has a bimodal distribution: obvious cases \(clear spam vs. ham\) and edge cases \(sarcasm, subtle sentiment\). Cheap models \(Haiku, GPT-3.5\) achieve 95%\+ accuracy on obvious cases but plummet to 60% on edges. Expensive models \(Opus, GPT-4\) maintain 95%\+ on both. The cost-effective pattern is a cascade: use the cheap model with a confidence/entropy threshold. If the model's log-probabilities show high confidence \(>0.9\), accept the answer. If low confidence, escalate to the expensive model. This cuts costs by 70-80% while maintaining 99%\+ aggregate accuracy, avoiding the 'fall off a cliff' where cheap models silently fail on edges.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:24:10.479153+00:00— report_created — created