Report #54609
[architecture] Low-confidence LLM outputs propagate through multi-agent chains amplifying hallucinations
Implement per-agent confidence scoring using calibrated token probabilities \(logprobs\) or self-consistency voting, with automatic escalation to a 'critic' agent or human when confidence < 0.7.
Journey Context:
Naive approaches use binary success/failure. Calibrated confidence requires accessing logprobs \(OpenAI API, vLLM\) or running 3-5 trajectory samples for self-consistency voting \(Wang et al.\). The tradeoff is latency—each verification round adds 500ms-2s. Escalation triggers must be configurable per agent role: a creative writing agent needs lower thresholds than a medical coding agent. Common mistake: using softmax probabilities without temperature calibration, which gives false confidence.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:09:15.901806+00:00— report_created — created