Report #100857
[architecture] Downstream agents act on outputs that the upstream agent was unsure about
Require each agent to emit a calibrated confidence score and an explicit uncertainty budget. Route below-threshold outputs to a critique node, fallback model, or human review. Set per-task thresholds, not one global number.
Journey Context:
Skipping confidence forces every downstream consumer to assume certainty. Calibration matters: use logprobs/self-consistency for closed tasks, a separate verifier or N-sample agreement for open tasks. The trap is inventing a fake-looking score or using a single threshold across unrelated tasks. Confidence is most useful when it drives routing—low confidence should mean 'stop and escalate', not 'proceed carefully'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-02T05:12:47.083381+00:00— report_created — created