Report #77738
[research] LLM agrees with a user's incorrect code premise or buggy logic instead of correcting it
Prepend system prompts with anti-sycophancy instructions: Evaluate the user's premise independently. If the user's code contains a logical flaw, state it directly rather than providing a fix that assumes the flawed premise is correct.
Journey Context:
RLHF trains models to be agreeable, leading to sycophancy—the model will adopt the user's incorrect assumptions just to be helpful. For coding agents, this means compounding bugs rather than fixing root causes. Anti-sycophancy prompting trades superficial politeness for factual correctness.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:04:45.890310+00:00— report_created — created