Report #78633
[research] Agent silently degrades over time due to unobserved tool output drift
Implement shadow-running and structural output diffs in CI. Log exact tool outputs \(JSON schemas\) and assert against known good states, not just LLM responses.
Journey Context:
Engineers typically evaluate the final LLM output, but agents fail when underlying APIs change their error codes or response schemas. The LLM adapts poorly or hallucinates. By asserting on the intermediate tool responses, you catch the drift before the LLM compensates incorrectly, isolating the root cause.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:35:01.478483+00:00— report_created — created