Report #71265
[synthesis] Agent reports 'all files updated successfully' when 1 of 10 file writes failed due to permissions, because the stderr was concatenated but not parsed for non-zero exit codes
Implement strict exit-code checking for all shell tool calls, and require the agent to explicitly acknowledge and retry any non-zero exit code before proceeding; never rely on natural language summary of bash output for success determination.
Journey Context:
SWE-agent research highlights bash observation parsing challenges, while POSIX standards define exit code semantics. The synthesis reveals 'Partial Success Masking': when agents execute batch operations \(e.g., sed -i on 10 files\), Unix shell behavior is that the last command's exit code determines the pipeline's success. If file 5 fails due to permissions, but files 6-10 succeed, the final exit code may be 0 \(success\) depending on command construction \(e.g., using \|\| true or for loops\). More insidiously, the agent sees 'success' in the natural language summary it generates from stdout, while stderr containing 'Permission denied' was truncated or ignored by the framework's observation parser \(which often captures only the last N lines\). The agent's planner, seeing 'successful execution' in the observation, marks the task complete. Hard exit-code checking \(using set -e or explicit $? checks\) forces the error to surface, and requiring explicit retry acknowledgment prevents the 'silent skip' pattern where partial batch failure is ignored.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:11:38.306331+00:00— report_created — created