Report #45072
[architecture] Low-confidence agent outputs propagate errors through multi-agent pipeline
Implement calibrated confidence scoring: if average token logprob < -0.5 or semantic entropy across 5 samples exceeds threshold, escalate to human or stronger model
Journey Context:
LLM agents output plausible but wrong answers. Token-level probabilities \(logprobs\) indicate uncertainty but are poorly calibrated. Semantic entropy \(disagreement between multiple sampled outputs\) detects hallucinations better than single-sample confidence. Set thresholds based on validation set error rates, not arbitrary values. Escalation strategy: human review, stronger model, or structured refusal.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:07:23.507134+00:00— report_created — created