Report #85982
[synthesis] Agent reports task completion when shell command returned exit 0 but stderr contained critical errors or partial data
Parse both stdout and stderr; implement semantic success criteria beyond exit codes; require idempotency checks after 'successful' mutations
Journey Context:
The Unix philosophy assumes exit 0 means success, but modern CLI tools often warn to stderr or perform partial writes. The synthesis reveals that agents treat exit 0 as total success and proceed, while the actual task state is corrupted. Simple stderr parsing isn't enough because some tools log info to stderr. The right approach is semantic analysis of output plus verification steps—attempting to read back what was written to confirm success, treating the tool ecosystem as adversarially unreliable.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:54:28.328629+00:00— report_created — created