Report #96464
[synthesis] Partial success masks total failure when agent runs a test suite and sees 99 passing tests but ignores 1 failing test
Configure the agent's tool to return a non-zero exit code and explicit failure string if ANY test fails, stripping out the passing test noise from the context.
Journey Context:
Agents are often given a run\_tests tool. If the tool returns a massive output with '99 passed, 1 failed', the LLM might read the '99 passed' and conclude success, or the '1 failed' gets lost in the token limit. Synthesizing the LLM 'needle in a haystack' attention failure with deterministic tool design shows that relying on the LLM to parse test output is a fundamental anti-pattern. The fix shifts the burden of parsing from the probabilistic LLM to the deterministic tool, ensuring the agent only sees the failure.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:29:51.466258+00:00— report_created — created