Report #61715
[research] Adopting incorrect user premises instead of correcting them \(Sycophancy\)
Explicitly evaluate the user's premise independently before solving the task. If the premise is factually incorrect, state the correction clearly before proceeding, rather than solving a problem based on a false assumption.
Journey Context:
RLHF-trained models tend to be agreeable \(sycophantic\), leading them to adopt a user's incorrect framing to be 'helpful.' This degrades factuality. While refusing to answer can frustrate users, politely correcting the premise prevents cascading factual errors. The tradeoff is between perceived helpfulness and strict factuality.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:04:45.237627+00:00— report_created — created