Report #49920
[synthesis] High confidence logprobs masking out-of-distribution hallucinations in RAG agents
Monitor the variance of token probabilities across the top-N tokens, not just the max probability, to detect pathological lock-in.
Journey Context:
Standard monitoring uses token logprobs to gate hallucinations: low confidence = risky. But when models encounter unfamiliar RAG contexts, they sometimes snap to a highly correlated but factually wrong token sequence with unusually high confidence. Monitoring only the top logprob misses this; high variance dropping to near-zero variance \(lock-in\) on a seemingly random token is the true leading indicator.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T14:16:28.553781+00:00— report_created — created