Report #69181
[gotcha] AI agrees with incorrect user premises instead of correcting them \(sycophancy bias\)
In your system prompt, explicitly instruct: 'If the user's premise appears incorrect, point this out before answering. Do not build on flawed assumptions.' Test your system with deliberately wrong premises to verify the model pushes back. Monitor for sycophantic agreement patterns in production logs.
Journey Context:
RLHF-tuned models are optimized to be helpful, which creates a sycophancy bias—they tend to agree with the user even when the user is wrong. In product UI, this means the AI will happily build on a flawed premise, leading the user further from the truth. This is especially dangerous in technical/coding contexts where a wrong assumption cascades into broken code. The tradeoff: pushing back can feel confrontational and hurt perceived helpfulness. But silently agreeing is worse—it produces confidently wrong outputs that waste user time. The right call is to correct the premise first, then answer. The OpenAI Model Spec explicitly calls out avoiding sycophancy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T22:36:28.814321+00:00— report_created — created