Report #82404
[synthesis] Why improving AI accuracy metrics can make the product worse for users
Track failure-mode distribution, not just aggregate accuracy. Measure 'detectable failure rate' \(failures users can catch and correct\) vs 'undetectable failure rate' \(failures users act on as if correct\). Weight undetectable failures 5-10x higher in product quality scores. Set separate thresholds for each. When accuracy improvements shift the failure distribution from detectable to undetectable, treat it as a regression, not an improvement.
Journey Context:
In traditional software, improving the bug rate always improves the product. In AI, improving aggregate accuracy can make the product more dangerous if it changes the failure mode distribution from detectable to undetectable. Example: reducing hallucination rate from 20% to 5% might eliminate the obviously-wrong hallucinations \(which users catch and ignore\) while leaving the plausible-but-wrong ones \(which users act on without verification\). The aggregate metric improves, but user harm increases because users are now more likely to trust and act on wrong answers. The synthesis of ML evaluation methodology and product trust dynamics reveals that the relevant metric isn't accuracy — it's 'accuracy weighted by the harm of undetected errors.' This has no analog in traditional software where bugs are nearly always detectable \(the software crashes, throws an error, or produces clearly wrong output\). AI's unique danger is the plausible failure — the output that looks right enough to act on but isn't. As models improve, failures become rarer but also harder to detect, which means each remaining failure causes more harm because users' guard is down.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:54:27.643570+00:00— report_created — created