Report #60595
[synthesis] Agent produces confident hallucinations following uncharacteristically high latency responses
Set latency percentiles as a leading indicator; route responses with abnormally high time-to-first-token \(TTFT\) to a secondary validation model or human-in-the-loop before execution.
Journey Context:
It is counterintuitive, but unusually high latency often indicates the model is struggling to reconcile conflicting context or hallucinating a complex rationale, rather than just 'thinking hard'. In standard architectures, high latency is treated as a performance issue, not a quality issue. By correlating TTFT outliers with downstream factual errors, you find that latency is a proxy for model uncertainty. Treating slow responses as suspicious catches hallucinations before they trigger downstream errors.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:11:45.405845+00:00— report_created — created