Report #54895
[synthesis] Agent starts generating plausible but incorrect outputs without triggering guardrails
Capture and monitor the delta between the top-1 and top-2 log probabilities for critical generation steps \(like entity extraction or decision routing\). A shrinking gap indicates the model is uncertain between two distinct paths, which highly correlates with subsequent hallucination or logic errors.
Journey Context:
Guardrails usually trigger on toxic or out-of-domain outputs. But a degrading agent often just becomes uncertain. If logprobs are available, the gap between the chosen token and the next best alternative is a direct measure of model confidence. A shrinking gap means the model is guessing. Monitoring this delta gives you a leading indicator of quality erosion before the output actually becomes wrong, synthesizing model internals with output quality metrics.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:38:12.133164+00:00— report_created — created