Report #23146
[synthesis] Agent passes one test or fixes one lint error and concludes the task is complete, ignoring remaining failures
Mandate that the agent re-runs the entire validation suite after every code change, and do not allow the agent to terminate until the suite returns 0 failures, or explicitly require it to output the full test run summary before terminating.
Journey Context:
Agents optimize for success signals. If they fix 1 of 5 failing tests, the tool output for that test run might show '1 passed', which the agent interprets as a positive reward signal, leading to premature termination. By forcing a full suite re-run and requiring the final output to be the complete summary, the agent cannot ignore the remaining failures.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T17:15:21.834657+00:00— report_created — created