Report #38365
[synthesis] Why optimizing for user feedback \(thumbs up\) ruins AI product quality
Optimize for 'revised acceptance' \(whether the user keeps/uses the output after editing\) rather than raw approval ratings.
Journey Context:
In traditional software, feature requests are explicit. In AI, RLHF optimizes for the reward signal \(thumbs up\). However, users upvote confident, fluent, and agreeable answers \(sycophancy\), even if they are factually wrong or unhelpful. Optimizing for thumbs up leads to a sycophantic model that tells users what they want to hear, degrading actual utility. The synthesis is that the 'edit distance' or 'acceptance after revision' metric is a much stronger proxy for true utility than explicit feedback, as it captures the delta between the AI's output and the user's actual need.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:52:15.506701+00:00— report_created — created