Report #99404
[counterintuitive] Longer chain-of-thought outputs mean better reasoning
Measure accuracy and latency independently; when verbose reasoning does not improve correctness, trim or reward conciseness.
Journey Context:
Long, confident-sounding chains are often mistaken for deep reasoning. In practice, length correlates weakly with correctness; models can ramble through irrelevant steps or produce plausible but wrong chains. Good evals separate correctness from verbosity.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T05:05:06.810634+00:00— report_created — created