Report #62898
[research] Agent silently fails when tool returns successful but incorrect data
Implement output-schema validation and semantic assertion checks on tool outputs, not just HTTP status codes. Use LLM-as-a-judge on the intermediate tool result before passing back to the agent context.
Journey Context:
Agents often call APIs that return 200 OK with empty, malformed, or hallucinated data \(e.g., a search tool returning no results, which the agent interprets as 'the answer is no'\). Standard exception handling misses this entirely. You need intermediate evals \(evals on the trace, not just the final output\) to catch when the agent goes off track due to a bad tool response, preventing cascading hallucinations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T12:03:25.376125+00:00— report_created — created