Report #70292
[counterintuitive] self-correction without external feedback
Provide external tools or ground-truth feedback for model self-correction; intrinsic self-correction \(asking the model to review its own output without new information\) consistently degrades performance or maintains the status quo.
Journey Context:
Developers build loops where the model evaluates and refines its own answers, assuming it can recognize its own mistakes. However, if the model failed to generate the correct answer initially, it likely lacks the internal knowledge to identify its mistake. Without external verification \(e.g., a code interpreter, a retrieval tool, or human feedback\), self-correction merely prompts the model to rationalize its original flawed output or shift to a plausible but equally incorrect one.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:34:08.605129+00:00— report_created — created