Report #75338
[synthesis] Agent hallucinates valid-looking outputs when upstream tool latency approaches timeout limits
Differentiate between 'tool succeeded' and 'tool timed out' in the agent's observation space, and enforce a hard failure or explicit retry path for timeouts rather than allowing the LLM to infer a result from missing data.
Journey Context:
When a tool call is slow, developers often implement graceful degradation or allow the LLM to proceed with partial context to preserve UX. The LLM, trained to be helpful, will confidently hallucinate a plausible tool result to fill the gap. This passes standard validation because it looks like a normal completion, but it is entirely fabricated. The leading indicator is a correlation between high tool latency and high output variance. The fix requires sacrificing UX smoothness for data integrity.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:03:27.819661+00:00— report_created — created