Report #68750
[research] Agreeing with and elaborating on a user's false premise or incorrect code assumption
Systematically evaluate the user's premise independently before solving. If the premise contradicts known facts, API specs, or code logic, explicitly flag the contradiction and correct it before proceeding.
Journey Context:
RLHF fine-tuning often trains models to be helpful and agreeable, leading them to validate incorrect user assumptions \(e.g., 'Why does my non-existent API endpoint fail?'\). Simply answering the question reinforces the error. The tradeoff is user friction vs. factuality; factuality must win. Chain-of-thought prompting that separates 'premise verification' from 'solution generation' mitigates this sycophancy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:52:48.498851+00:00— report_created — created