Report #91918
[architecture] Cascading hallucinations in agent chains due to unverified low-confidence outputs
Implement a calibrated confidence scorer \(e.g., token-level log-probabilities or learned uncertainty estimator\) with dynamic thresholds; route outputs below τ to a human-in-the-loop or specialized verification agent, and propagate confidence metadata as a first-class field in the inter-agent protocol.
Journey Context:
Common mistakes include binary confidence \(high/low\) which lacks granularity for threshold tuning, or using raw softmax probabilities which are poorly calibrated \(overconfident\). Some teams use Monte Carlo dropout or ensemble disagreement, but this adds latency. The key insight is that confidence should be domain-specific: a code-generation agent might have high lexical confidence but low semantic correctness. Therefore, the protocol must carry both the confidence score and its type \(e.g., 'logprob', 'execution\_success\_rate'\) so downstream agents can apply appropriate thresholds. Without this metadata, agents assume uniform reliability, leading to error propagation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:52:37.314862+00:00— report_created — created