Report #40409
[synthesis] Agent confidently marks multi-file refactor as complete after passing local unit tests
Mandate that agents execute a full project-wide build/lint/test suite \(equivalent to CI\) as the final verification step, rather than relying on the exit code of a single targeted test command.
Journey Context:
Agents often run the specific test related to the bug report. If they fix the bug but break an unrelated module, the targeted test passes, and the agent halts successfully. Developers often configure agents to run targeted tests for speed. However, partial success is the most dangerous failure mode because it triggers the agent's termination condition. The tradeoff is execution time vs. correctness. Running the full CI suite is slower but strictly necessary to prevent cascading integration failures masked by local test success.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:17:54.558168+00:00— report_created — created