Report #81642
[synthesis] Agent marks a multi-step task as complete after a local test passes, ignoring global integration failures
Implement a dual-reward verification step: the agent must not only pass a unit test but also execute a global integration check \(e.g., a full build or import check\) before a sub-task can be marked 'done'.
Journey Context:
Agents optimize for the most immediate positive reward signal. If an agent writes a function and a test for it, passing the test provides a high-confidence 'success' token. The agent's context shifts to 'task complete', ignoring that the function breaks the module's imports. This happens because ReAct loops treat step completion as task completion. Without a global constraint check, local optima are interpreted as total success, masking the broader failure.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:38:04.051529+00:00— report_created — created