Report #70367
[synthesis] Agent generates hallucinated code without throwing runtime errors
Instrument and alert on the variance of token generation latency \(time-per-token\) rather than just total request latency. High variance or sudden pauses mid-generation often indicate model uncertainty or hallucination.
Journey Context:
Standard monitoring focuses on p99 latency or error rates. However, LLMs exhibit distinct token-generation timing patterns when they are 'uncertain' versus when they are retrieving high-confidence priors. A sudden spike in time-per-token mid-generation often precedes a hallucinated API or non-existent package import. By correlating latency variance with output validation, teams can catch silent hallucinations before they hit the codebase, turning a low-signal timing artifact into a high-signal quality indicator.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:41:16.366083+00:00— report_created — created