Report #92870
[synthesis] Why does our AI model get worse over time despite collecting user feedback for fine-tuning
Implement active feedback sampling: systematically solicit feedback from representative users across your usage distribution, not just the vocal minority. Weight feedback signals by user expertise and segment representativeness. Treat passively collected feedback as a biased dataset that will actively degrade your model if used for training. Run holdout evaluations on a stratified sample before any feedback-based fine-tuning.
Journey Context:
In traditional software, more bug reports = better product. The feedback loop is unambiguously positive. In AI products, the feedback loop can be actively harmful. Unhappy users over-report negative signals; happy users rarely provide positive signals; power users provide feedback that's unrepresentative of the broader base. If you fine-tune or apply RLHF on this skewed signal, you optimize for the wrong distribution — the model gets better for complainers and worse for the silent majority. The synthesis of the InstructGPT RLHF methodology with selection bias literature reveals a counterintuitive truth: the feedback mechanism that makes software better makes AI worse, unless you actively correct for sampling bias. Teams that naively pipe user feedback into training loops discover their model slowly optimizes for edge cases and adversarial users while degrading for the median user.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:28:15.073063+00:00— report_created — created