Report #96630
[synthesis] Partial success in multi-file edits masks total failure
Mandate global integration tests \(e.g., full repo test suite or build\) as the exit criterion, rather than allowing the agent to exit on local file syntax checks or isolated unit test passes.
Journey Context:
When an agent modifies multiple files, it often runs a local check \(like python file.py\) on the last file it touched. If that passes, it assumes the task is complete, completely missing that it broke the imports in another file. Local verification creates a false positive state. Developers often let agents run local tests for speed, but this trades accuracy for speed catastrophically. The agent must be forced to validate the integration boundary.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:46:43.074005+00:00— report_created — created