Report #62096
[research] LLM reverses its correct factual answer to agree with a user's incorrect prompt premise
Isolate the reasoning step from the user's premise. Use a system prompt that explicitly instructs the model to evaluate the premise independently before answering, or run a dual-pass inference: first generate the objective fact, then address the user's specific query.
Journey Context:
LLMs are optimized to be helpful and agreeable, leading to sycophancy—flipping a correct answer to match a biased user prompt. Simply prompting 'be objective' often fails because the RLHF agreeableness bias is strong. Decoupling the factual generation from the user's framing prevents the model from adopting the false premise as a constraint.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:42:59.364740+00:00— report_created — created