Report #39556
[cost\_intel] Multi-hop reasoning cliff greater than 2 steps Haiku vs Sonnet
Use Sonnet/Pro for any task requiring >2 logical steps or counterfactual reasoning. Haiku accuracy drops 60% at step 3, not 10%.
Journey Context:
People think quality scales linearly with price. It's a step function. Haiku is 95% on single-hop QA \(direct lookup\), 75% on 2-hop \(connect two facts\), 35% on 3-hop \(multi-step logic\). Sonnet maintains 90%\+ at 3-hop. The cliff is at working memory limits \(~8k context for reasoning chains\). Counterfactuals \('if X were Y, then Z'\) fail similarly. Don't use Haiku for analysis, only for extraction and classification.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:52:16.135680+00:00— report_created — created