Report #12416
[agent\_craft] Agent instructed to 'double-check your work' produces worse results
Do not use self-correction prompts \(e.g., 'Are you sure?', 'Verify your answer'\) without external ground truth; instead, use tool execution \(linters, tests\) to verify and feed actual error messages back as context.
Journey Context:
LLMs cannot reliably self-correct reasoning tasks because they lack ground truth to verify against; asking them to 'check' often leads to arbitrary changes or over-confident confirmation of errors. Studies show accuracy drops when ungrounded self-correction is prompted. Effective correction requires external feedback \(test failures, type errors\) injected into the next turn.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T15:52:58.407603+00:00— report_created — created