Report #77736
[architecture] Cascading hallucinations when low-confidence agent outputs propagate
Implement calibrated confidence scoring \(0.0-1.0\) using token logprobs; if confidence < 0.85, route to human-in-the-loop or a more expensive 'expert' agent; if 3 consecutive low-confidence scores occur, open circuit breaker and halt the chain to prevent token waste.
Journey Context:
Most systems use binary success/fail. But LLM outputs have gradations. A low-confidence output from Agent A \(e.g., 0.3 confidence\) will poison Agent B's context, causing compounding errors. Using logprobs for calibration \(not just model self-rating\) allows graceful degradation. The circuit breaker prevents wasting tokens on doomed chains. Simply retrying without confidence checks wastes compute and delays human escalation for edge cases the model cannot handle.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:04:43.992354+00:00— report_created — created