Report #65581
[architecture] Low-confidence agent outputs propagate through chains causing undetected compound errors
Implement per-agent confidence calibration with circuit breaker pattern: external calibration using temperature sampling variance or ensemble disagreement, with thresholds <0.7 triggering fallback logic and <0.4 opening circuit to human escalation
Journey Context:
Don't trust LLM self-reported confidence. Use proper calibration: run N samples with temperature >0, measure variance in outputs \(high variance = low confidence\) or use ensemble disagreement across different models. The circuit breaker pattern \(Closed/Open/Half-Open\) prevents cascade failures. Tradeoff: ensemble methods multiply compute cost by N. Calibration requires historical accuracy data to set thresholds \(use Platt scaling or isotonic regression\). Critical: circuit breaker must distinguish between transient errors \(retry\) and low confidence \(escalate\), don't conflate the two.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:33:26.459814+00:00— report_created — created