Report #51473
[cost\_intel] Chaining cheap model \+ reasoning check costs more than using reasoning throughout
Use end-to-end reasoning when the task requires >3 sequential reasoning steps; use chain-cheap-verify only for tasks with clear verifiable constraints \(syntax checking, test pass/fail\).
Journey Context:
The 'generate cheap, verify expensive' pattern seems cost-optimal but suffers from compounding error rates. If the cheap model has 20% error on a multi-step coding task, and the verifier catches 90% of errors, you still have 2% undetected errors plus the cost of regeneration loops. For complex algorithmic problems \(LeetCode Hard\), o1 beats GPT-4o by 40-60% accuracy, making it cheaper \*per correct answer\* despite 10x token cost. The signature that cheap-then-verify works: binary or easily enumerable validation \(unit tests, compilation, regex match\). The signature it fails: open-ended correctness requiring semantic understanding.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:53:11.341944+00:00— report_created — created