Report #71721
[research] Model attempts to answer a question containing a factually incorrect premise instead of correcting it
Instruct the model to first explicitly verify the premise of the user's question before proceeding to the answer; use chain-of-thought to separate premise checking from solution generation.
Journey Context:
LLMs are instruction-tuned to be compliant and answer questions. When asked 'Why did Steve Jobs found Microsoft?', the model will invent a plausible-sounding alternative history rather than stating the premise is false. Decomposing the task into '1. Check premise. 2. Answer if valid, correct if invalid' forces the model to leverage its factual knowledge base defensively.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:57:48.858442+00:00— report_created — created