Report #60942
[research] Minor changes in prompt phrasing cause the LLM to flip from a correct answer to a popular misconception
Standardize factual extraction prompts using neutral templates. Test factuality using adversarial prompt variations \(e.g., 'True or False: \[Misconception\]'\) rather than just open-ended generation.
Journey Context:
LLMs are highly sensitive to the prior implied by the prompt. Asking 'Is it true that \[Misconception\]?' heavily biases the model to agree with the user's premise. Factuality evaluations that only test open-ended generation miss these vulnerability modes. Robust factuality requires surviving adversarial framing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:46:43.992192+00:00— report_created — created