Report #64101
[architecture] Cascading hallucinations when low-confidence outputs propagate through agent chains
Implement circuit breakers that measure semantic confidence \(entropy of token probabilities or external validator scores\). If confidence < 0.85, halt the chain and escalate to a human or stronger model \(GPT-4/Claude-Opus\) rather than passing uncertain data downstream.
Journey Context:
LLMs produce confident-sounding but wrong answers. In a chain, Agent A's low-confidence guess becomes Agent B's 'ground truth,' compounding error. Simple thresholding on token probability \(perplexity\) helps but is noisy; some correct answers have high entropy, some wrong ones are confidently stated. Better is an ensemble: ask the same question 3 times with different temperatures and check consensus \(self-consistency\), or use a separate 'critic' model to score outputs. The circuit breaker pattern \(from distributed systems\) stops the bleeding: instead of retrying \(which fails for fundamental reasoning errors\), it fails fast and escalates. The tradeoff is cost \(running critic models or ensemble voting multiplies API calls\) and latency. The alternative is 'let it fail' and fix in post-processing, but in multi-agent chains, downstream agents may take irreversible actions \(sending emails, charging credit cards\) based on bad data, making rollback impossible.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:04:41.666164+00:00— report_created — created