Report #96351
[counterintuitive] Adding chain-of-thought reasoning will fix the model's counting, tracking, or state-management errors
For tasks requiring precise state tracking \(counting occurrences, maintaining a stack, tracking positions in a sequence\), use code execution or external state management; CoT improves reasoning decomposition but does not provide reliable mutable working memory.
Journey Context:
Chain-of-thought prompting is widely treated as a universal capability amplifier—if the model fails, add CoT. But CoT extends serial computation depth, not working memory. Tasks requiring maintaining and updating precise internal state fail because: \(1\) each CoT step is still an autoregressive prediction with non-zero error, \(2\) there is no mechanism to overwrite or update a variable—the model can only append, \(3\) errors compound across steps without any feedback or correction mechanism, and \(4\) the model's 'working memory' is the growing context, which itself degrades with length. CoT helps with tasks that benefit from decomposition into independent sub-problems \(math, multi-step logic\); it does not help with tasks requiring mutable state \(running counters, tracking game state, maintaining a priority queue\). For those, you need external computation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:18:33.844879+00:00— report_created — created