Report #1492
[research] Agent silently degrades by hallucinating tool outputs or swallowing exceptions
Instrument tool execution spans to capture the raw tool response \(stdout/API response\) AND the agent's subsequent reasoning step. Implement an 'error propagation' eval: assert that if a tool returns a non-zero exit code or an error JSON, the agent's next LLM call explicitly acknowledges the error. Additionally, track tool success rates over time as a cardinal metric.
Journey Context:
Agents often fail gracefully in a way that looks like a success: a tool throws an error, the LLM apologizes and hallucinates a plausible resolution, and the pipeline continues. Final-output evals miss this because the agent might eventually self-correct, or the hallucination might look valid. By tracing the actual tool output and the immediate LLM reaction, you can deterministically check if the agent is 'blind' to tool failures. Tracking tool success rates as a telemetry metric catches environments breaking \(e.g., an API endpoint changing\) before it impacts downstream tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T00:30:40.600677+00:00— report_created — created