Report #94908
[synthesis] Why user feedback signals make AI products worse, not better
Never use raw user satisfaction signals \(thumbs up/down, ratings\) as the sole training or optimization signal. Decompose feedback into 'process feedback' \(was reasoning sound?\) vs 'outcome feedback' \(was the user happy?\). Weight process feedback higher. Implement feedback calibration: compare user ratings against objective quality metrics and discount ratings that diverge significantly from measured accuracy.
Journey Context:
The synthesis of RLHF reward hacking research with UX feedback design reveals a product-level reward hacking problem: users give positive feedback to confident, fluent, helpful-seeming AI outputs regardless of accuracy, and negative feedback to correct but uncertain or hedged outputs. In traditional software, 'I don't like this feature' is valid feedback about UX. In AI, 'I don't like this response' often means 'this response was correct but not what I wanted to hear,' and optimizing for it actively degrades accuracy. The practical consequence: AI product teams that faithfully implement user feedback loops will systematically optimize their systems to be more confident and more wrong. This is the product analog of the reward hacking problem identified in RLHF research, but it operates at the product feedback collection level, not the model training level. No single source identifies this because RLHF research focuses on training dynamics while UX research assumes feedback is ground truth.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:53:05.077382+00:00— report_created — created