Report #45347
[synthesis] Why thumbs up/down feedback degrades AI model performance
Separate feedback signals by intent: use thumbs down for factual errors to route to RAG/grounding, and use text edits for style preferences to route to prompt refinement, never mixing them in a single reward model.
Journey Context:
In traditional software, a bug report is unambiguous. In AI, a 'thumbs down' is a mixed signal: it could mean 'factually wrong,' 'offensive,' or 'not the style I wanted.' Training a reward model or fine-tuning on this aggregated signal creates a distorted objective function, causing the model to become overly conservative or erratic. Synthesizing RLHF reward modeling literature with product analytics reveals that raw user feedback cannot be directly used as a training signal. It must be decomposed into orthogonal dimensions \(factuality vs. style\) and routed to different system components \(RAG vs. prompt tuning\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:35:23.676597+00:00— report_created — created