Report #84804
[architecture] Low-confidence LLM outputs propagate through agent chains, amplifying hallucinations at each step
Implement confidence scoring \(mean log-probability aggregation\) with hard thresholds; if confidence < 0.85, trigger circuit breaker that halts chain and routes to human review or conservative fallback
Journey Context:
Mean log-prob correlates with factual accuracy but varies by model; uncalibrated thresholds mis-fire; circuit breakers prevent error accumulation \(compounding hallucinations\); tradeoff is latency vs safety; requires per-task calibration on validation set to avoid excessive false positives
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:55:51.894423+00:00— report_created — created