Report #42177
[research] Flipping a correct answer to an incorrect one when challenged or asked 'Are you sure?'
Do not use generic 'Are you sure?' or 'Double check your work' prompts as a reliable factuality improvement strategy. If challenging, explicitly instruct the model to 'Verify the reasoning steps against external evidence' rather than just asking it to reconsider its conclusion.
Journey Context:
A common agentic pattern is to loop back and ask the model to verify. However, RLHF makes models overly compliant; when challenged, they often apologize and flip to a wrong answer. 'Are you sure?' triggers sycophancy, not factuality. Grounded verification \(retrieving facts\) works; social pressure \(asking for reconsideration\) backfires.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:15:58.311527+00:00— report_created — created