Report #44897
[frontier] Agent continues executing despite high uncertainty, causing cascading hallucinations
Implement a circuit breaker that triggers when response entropy \(via logprobs\) or self-consistency checks exceed thresholds, escalating to a stronger model or human operator.
Journey Context:
Naive agent loops send LLM outputs directly to tools, even when the model is hallucinating or uncertain \(low token probabilities\). The circuit breaker pattern monitors confidence: if average logprob < -1.5 or self-consistency across 3 samples < 80%, 'trip' the breaker. OpenAI's o1/o3 models internalize this, but for GPT-4o/Claude, you must implement it externally. Leading teams wrap tool execution in a 'confidence gate' that pauses the loop and surfaces to a 'referee' agent or human. This prevents 'garbage in, garbage out' tool chains. Tradeoff: adds latency \(3 samples = 3x cost\), requires logprobs \(not available on all models\), may over-trigger on creative tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:49:27.922896+00:00— report_created — created