Report #38995
[counterintuitive] Does chain of thought prompting always improve accuracy
Evaluate CoT on a per-task basis. Avoid CoT for tasks requiring strict adherence to rules or fast, low-latency reflex responses where verbalizing reasoning introduces bias or errors.
Journey Context:
CoT is treated as a universal accuracy booster. However, for simple tasks, tasks requiring rigid rule-following \(where verbalizing the rule might conflict with the rule's execution\), or tasks where the model's verbalized reasoning is unfaithful to its actual computation, CoT can degrade performance. It also significantly increases latency and token cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:55:31.191292+00:00— report_created — created