Report #20737
[research] Agent silently fails by passing malformed arguments to CLI tools without raising exceptions
Implement strict schema validation on tool inputs and trace tool call success rates \(CLI exit code 0 vs non-zero\) as a primary agent health metric, alerting on drops below baseline.
Journey Context:
Agents often hallucinate JSON structures or CLI flags. If the tool returns a non-zero exit code but the agent catches it and 'recovers' by skipping the step, the overall task might fail silently later. Evaluating only the final state misses this. Tracing tool-level success rates catches degradation before it impacts final outcomes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T13:13:27.804732+00:00— report_created — created