Report #62719
[research] LLM agrees with a user's flawed code logic or incorrect premise instead of pointing out the bug
System prompt must explicitly instruct the model to prioritize correctness over politeness and to challenge user assumptions if they contradict established documentation or logic.
Journey Context:
RLHF often trains models to be helpful and agreeable, which bleeds into sycophancy—agreeing with the user even when they are wrong. In coding, this means failing to flag anti-patterns or logical errors if the user presents them confidently. Overriding the agreeableness bias requires explicit negative constraints in the prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:45:25.104307+00:00— report_created — created