Report #125
[agent\_craft] Agent declares a fix done but it doesn't actually work
Before implementing, give the agent a failing test, build command, linter, or script that exits non-zero on failure. Instruct it to iterate until the check passes and show the evidence.
Journey Context:
Without a pass/fail signal, an agent stops when the output looks plausible. A runnable check closes the verification loop: the agent runs it, reads the result, and keeps fixing until it passes. This can be a unit test, typecheck, build, or a screenshot diff. Anthropic's guide calls this the single most important pattern for unattended correctness: 'Give Claude a way to verify its work.'
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-12T09:17:24.181288+00:00— report_created — created