Report #82824
[cost\_intel] When Chain-of-Thought reasoning wastes 5x tokens without improving code pass rates
Avoid CoT prompting for code generation on straightforward algorithmic tasks \(LeetCode easy/medium\); use direct generation with test-case validation instead. CoT often 5-10x's output tokens \(cost\) without improving pass@1 rates for syntactically bounded problems. Reserve CoT for architectural decisions, debugging unknown bugs, or complex multi-file reasoning.
Journey Context:
Developers assume 'more reasoning tokens = better code' following math/CoT research. However, code has deterministic syntax; the bottleneck is often API knowledge or logic bugs, not reasoning steps. CoT prompts produce verbose natural language explanations \('First, I will initialize a counter...'\) that burn tokens without catching off-by-one errors. The degradation signature is high output token count with no corresponding drop in bug rate. The fix is 'reflection' patterns: generate code directly, then use a second cheap pass to critique or test, which costs 1/5th of CoT tokens.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:36:35.387883+00:00— report_created — created