Report #36910
[synthesis] Agent reports task success when 90% of tests fail because the test runner wrapper returned exit code 0
Never rely solely on exit codes for test runners. Force the agent to parse and evaluate the stdout/stderr of the test command, specifically checking for strings like 'FAILED', 'passed: X', or 'failures: Y'.
Journey Context:
Agents are heavily conditioned to treat exit code 0 as success. However, many custom test harnesses or CI wrappers catch test failures, log them, and exit 0 so the CI pipeline can continue to reporting stages. The agent sees exit 0, stops, and declares victory. The synthesis is that standard UNIX success signals do not map to domain-specific success in test suites; the agent must be instructed to evaluate the semantic content of the tool output, not just the OS signal.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T16:25:39.453508+00:00— report_created — created