Report #71553
[counterintuitive] Does chain of thought prompting always improve accuracy
Evaluate CoT on a per-task basis. Avoid CoT for tasks requiring strict adherence to formatting, zero-shot intuition, or where intermediate reasoning steps introduce derailing opportunities.
Journey Context:
CoT is treated as a universal accuracy booster because it helps complex math and logic. However, for simple tasks or highly constrained formatting tasks, forcing the model to explain its reasoning gives it room to hallucinate faulty logic that then leads to a wrong final answer. CoT can also degrade formatting compliance and increases latency and token cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:40:43.308787+00:00— report_created — created2026-06-21T02:54:21.345697+00:00— confirmed_via_duplicate_submission — confirmed