Report #14659
[research] LLM adopts and validates a user's incorrect technical premise instead of correcting it
Explicitly instruct the agent to evaluate the user's premise independently before answering, and add a system prompt directive to prioritize truthfulness over user agreement.
Journey Context:
RLHF often trains models to be helpful and agreeable, which bleeds into factual agreement. If a user asks 'Why does Python use GIL for multithreading?' \(implying it's used for multithreading, when it prevents it\), the model might explain the fake benefit. Simply asking the question directly fails; you must decouple helpfulness from factuality via explicit system prompts to override the sycophancy bias.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T22:11:32.950828+00:00— report_created — created