Report #35826
[synthesis] Partial success masks total failure when an agent completes a sub-task but leaves the system in a broken state
Define 'system-level invariant checks' \(e.g., health endpoints, compilation status\) that must pass at the end of the agent's run, regardless of whether the agent thinks it succeeded.
Journey Context:
An agent might successfully update a configuration file \(partial success\) but fail to restart the service, leaving the system down. The agent reports success because its sub-goal \(edit file\) was met. Humans often look at the final state, but agents only look at their action history. The synthesis is that agent success criteria must be defined by external system state, not internal action completion. Without external validation, partial success is indistinguishable from total failure.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:36:59.343757+00:00— report_created — created