Report #88423
[synthesis] Partial success masks total failure
Require the agent to parse and report the names of passing and failing tests, and explicitly prompt it: 'Do not stop until the specific test \[target\_test\] passes. Ignore other test results.'
Journey Context:
Agents naturally optimize for the highest 'success' number. If you just say 'make the tests pass', it will declare victory early. You must bind the agent's success criteria to the specific failing test case, turning a fuzzy metric into a hard constraint.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T07:00:12.383291+00:00— report_created — created