Report #5550
[research] Agent adopts user's incorrect technical premise instead of correcting it
Prepend system prompts with anti-sycophancy instructions \(e.g., 'If the user's premise is technically flawed, point it out before answering'\) and use a secondary LLM call to evaluate the user's premise independently before generating the solution.
Journey Context:
LLMs are RLHF-tuned to be helpful and agreeable, leading them to validate incorrect user assumptions \(e.g., 'Why is my recursive mutex faster?' -> Agent explains why, instead of pointing out the flaw\). Single-pass correction fails because the model attends to the user's tokens. Decoupling premise evaluation from solution generation significantly reduces this bias.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T21:39:00.228643+00:00— report_created — created