Report #80275
[architecture] Cascading degradation when low-confidence agent outputs propagate
Implement circuit breaker pattern: if agent's self-reported confidence \(logprob mean or external evaluator\) falls below threshold \(e.g., 0.7\), halt chain and escalate to human or fallback model rather than passing uncertain data downstream.
Journey Context:
Chains often fail silently when Agent A produces 'hallucinated but plausible' data that Agent B then uses to make bad decisions. Simple retries don't fix systemic low confidence. The Circuit Breaker pattern \(from distributed systems, Nygard\) adapted for LLM agents monitors confidence metrics—either intrinsic \(average token log probability\) or extrinsic \(a separate evaluator agent\). When confidence drops below a threshold for N consecutive calls, the breaker opens: subsequent requests immediately fail fast to a human-in-the-loop or a more expensive but accurate fallback model \(e.g., GPT-4 → Human, or Claude Opus → Sonnet with tool use\). This prevents error amplification downstream.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:20:46.629317+00:00— report_created — created