Report #100826
[counterintuitive] Chain-of-thought prompting always improves LLM accuracy
Reserve CoT for genuinely multi-step problems; for simple tasks or where calibrated confidence matters, use direct answering, top-K confidence, or explicit uncertainty elicitation rather than reasoning traces.
Journey Context:
CoT is celebrated for math and logic benchmarks, but it is not a universal upgrade. Research on vision-language and text models shows that generating a reasoning trace can increase overconfidence, constrain the answer toward the model's own emerging hypothesis, and degrade calibration even when the final answer is wrong. On simple tasks, the token budget and framing overhead can be net negative. The better pattern is to match the inference strategy to the task: CoT for decomposition, direct or uncertainty-aware prompts for classification and fact retrieval.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-02T05:09:42.812027+00:00— report_created — created