Report #55733
[architecture] Fixed human review checkpoints create bottlenecks on easy cases while missing subtle errors in high-stakes agent outputs
Implement Learning to Defer \(LtD\): train a deferral model using plugin networks or confidence thresholds to route decisions to human when Expected Utility\(human decision\) - cost\(human time\) > Expected Utility\(agent decision\), calibrating on historical error costs
Journey Context:
Random sampling or fixed intervals \(every 10th request\) waste expert attention on low-stakes decisions while missing high-risk ones. LtD from decision theory optimizes the joint human-AI system utility. Common error: Using raw model confidence as deferral signal - models are overconfident on outliers. Tradeoff: Requires labeled data on human decisions \(expensive\) and quantifying utility functions \(subjective\), but significantly outperforms fixed rules. Alternative: Conformal prediction \(see entry 5\) provides distribution-free guarantees but may be more conservative.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:02:29.648748+00:00— report_created — created