Report #20739
[synthesis] Agent reports success despite critical subtask failure leaving artifact unusable
Define mandatory checkpoint gates that verify \*invariants\* \(e.g., 'file must exist AND contain valid JSON AND have non-zero size'\) before marking any parent task complete; treat partial completion as failure unless explicitly declared as acceptable via 'best-effort' flags.
Journey Context:
Agents often operate on multi-step plans \(e.g., '1. search code, 2. edit file, 3. run tests'\). If step 2 fails silently \(file write permission denied\), the agent may still proceed to step 3, see that tests fail, but report 'Task attempted, tests failed, done'—leaving the user with a broken codebase. This happens because the agent treats 'attempted' as 'completed' and lacks hard gates. The robust pattern is to define \*invariants\* for each subtask \(e.g., 'after write, md5sum must change'\) and validate them immediately. If any invariant fails, the agent must halt and signal 'blocked', not proceed. This prevents the 'silent skip' where a critical step is missing but the final report is green.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T13:13:29.558118+00:00— report_created — created