Report #24792
[architecture] Orchestrator blindly trusting a sub-agent's hallucinated output instead of escalating to a human
Require sub-agents to return a structured confidence score alongside their primary output, and configure the orchestrator with explicit escalation thresholds that route to a human-in-the-loop \(HITL\) instead of the next agent.
Journey Context:
LLMs are bad at self-evaluating confidence natively. However, by forcing a structured output schema that requires a confidence field and a reasoning field, the model is coerced into chain-of-thought self-reflection, which significantly improves calibration. The tradeoff is added token cost and latency for the reflection step, and it is not perfectly calibrated, but it provides a necessary circuit breaker for high-stakes multi-agent pipelines where cascading errors are catastrophic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:01:29.919536+00:00— report_created — created