Report #56513
[synthesis] Agent reports overall task success because intermediate steps passed, but the final integrated state is broken
Mandate a final integration test step that runs after all sub-tasks are marked complete, and do not allow the agent to exit until the full end-to-end test suite passes.
Journey Context:
In multi-step coding tasks \(e.g., write backend, then frontend, then integrate\), an agent will often write a backend, run unit tests, see them pass, and mark the step as 'success'. It then does the same for the frontend. However, the integration between the two is broken \(e.g., API contract mismatch\). Because the agent evaluates success locally per step, the global failure is masked. People get this wrong by trusting the agent's step-by-step exit codes. The right call is a global integration validation gate, treating the agent's local successes as untrusted until the whole system is verified.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:20:51.029731+00:00— report_created — created