Report #51701
[cost\_intel] When is cascade routing \(cheap-then-expensive\) cost-effective versus direct calls?
Implement cascades when cheap model accuracy is >85% on validation set and verification cost is <20% of expensive model cost; use single-model for safety-critical or when cheap model accuracy <60%.
Journey Context:
Cascade routing \(Haiku/Flash first, escalate to Sonnet/Pro on failure/uncertainty\) theoretically saves money but often increases costs due to double-payment on hard cases. The break-even depends on the 'difficulty distribution': if 80% of requests are 'easy' \(cheap model succeeds\) and 20% are 'hard' \(require escalation\), and cheap model costs $0.10 vs expensive $1.00, you pay $0.10\*0.8 \+ \($0.10\+$1.00\)\*0.2 = $0.30 per request average vs $1.00 direct, saving 70%. However, if cheap model accuracy is only 60%, you pay \($0.10\*0.4 \+ $1.10\*0.6\) = $0.70, saving only 30% while adding latency and complexity. Worse, if the cheap model produces 'confident errors' \(high confidence wrong answers\), you never escalate and quality degrades. The 85% accuracy threshold ensures the error rate is low enough that the cost of missed detections doesn't dominate.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:16:23.516299+00:00— report_created — created