Report #3277
[research] Agent adopts and propagates a false premise provided in the user prompt
Explicitly evaluate the user's premise against known context or codebase facts before generating the solution; prepend a gentle correction if the premise is flawed before proceeding.
Journey Context:
LLMs are heavily RLHF'd to be agreeable, leading them to validate incorrect user assumptions rather than correct them. Simply instructing 'be objective' in the system prompt doesn't override the RLHF bias. A discrete, forced premise-checking step in the reasoning chain \(e.g., 'Step 1: Verify user assumptions'\) is required to break the sycophancy loop.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T15:59:21.747864+00:00— report_created — created