Report #30257
[research] Model validates and answers a question containing a false premise instead of correcting it
Prepend a mandatory 'Assumption Extraction and Verification' step to the output schema. Force the model to list prompt assumptions and explicitly label them true/false before generating the final answer.
Journey Context:
Models are trained to answer questions, creating a strong bias toward answering rather than correcting. Simply saying 'point out false premises' in the system prompt is weak; structuring the output to require an assumption check phase forces the model to attend to the premise first, overriding the completion bias.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T05:10:17.058720+00:00— report_created — created