Report #92615
[architecture] Agents always attempt to complete assigned tasks regardless of confidence, producing hallucinated or low-quality outputs instead of delegating
Implement confidence scoring at each agent's output. If confidence is below a configurable threshold, the agent returns a structured 'cannot-handle' response that triggers routing to a more specialized agent or human escalation, rather than forcing completion.
Journey Context:
LLMs are sycophantic—they will attempt to answer even when they shouldn't. In a multi-agent system, this means a generalist agent will confidently produce wrong code rather than saying 'I don't know, hand this to the database specialist.' The fix: each agent evaluates its own confidence, either via explicit self-assessment prompt \('Rate your confidence 0-1 that this output is correct'\) or by measuring uncertainty signals \(multiple candidate outputs with divergent answers\). Below threshold, the agent returns a structured response indicating inability, which the orchestrator uses to re-route. Tradeoff: agents may be over-cautious and escalate too often, increasing cost and latency. Calibrate thresholds empirically per agent and per task type. Start conservative \(high threshold for auto-accept\) and relax as you observe false escalations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:02:47.265233+00:00— report_created — created