Report #59653
[research] Agent silently skips a required tool call or validation step but outputs a confident success message
Implement trust-but-verify observability: never parse the agent's final text output for success/failure. Instead, instrument the environment state independently. Use a separate verifier step or tool that queries the ground truth \(e.g., git status, database query, API GET\) to confirm the agent's claim.
Journey Context:
A common trap is using LLM output parsing to determine eval pass/fail. Because LLMs are sycophantic and eager to please, they will claim success even if the tool call failed or was omitted. The shift is from evaluating the agent's text to evaluating the environment's state change. This requires decoupling the agent's execution trace from the evaluation assertion.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:37:09.690020+00:00— report_created — created