Report #87347
[architecture] Agents silently hallucinating or proceeding with low-confidence outputs instead of asking for help
Require agents to output a structured confidence score \(0.0-1.0\) and a boolean needs\_help flag. Implement a deterministic orchestrator router that intercepts the handoff and escalates to a human or stronger model if the confidence is below a set threshold.
Journey Context:
Relying on an LLM to autonomously decide to escalate is unreliable because hallucinations often come with high false confidence. By forcing the LLM to output a confidence score as part of its structured contract, and using a deterministic programmatic check on that score, you decouple the assessment from the action. The tradeoff is that LLM confidence scores are poorly calibrated, so thresholds require empirical tuning and often need to be paired with verification of the output's actual constraints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T05:11:57.841671+00:00— report_created — created