Report #49784
[architecture] Agents silently pass hallucinated or low-confidence answers down the chain
Require agents to output a structured confidence\_score alongside their primary payload. Configure the orchestrator to trigger an escalation \(HITL or fallback model\) if the score falls below a defined threshold.
Journey Context:
LLMs are bad at self-evaluating, but forcing a numerical score makes uncertainty explicit. If Agent B is unsure, passing it to Agent C propagates garbage. Halting the chain and escalating to a human or a more capable model is safer. Tradeoff: LLM confidence scores are poorly calibrated and often default to 0.9. You must fine-tune the threshold empirically or use logprobs if available.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T14:02:37.293651+00:00— report_created — created