Report #88363
[research] Asking 'Is X true?' or 'Verify this code' results in false positives and missed bugs
Frame verification tasks negatively. Ask 'Find the bug in this code', 'Why might this fact be incorrect?', or 'List reasons this assertion fails'. Use red-teaming prompts rather than confirmation prompts.
Journey Context:
LLMs have an inherent affirmative bias due to pretraining on text that largely asserts truths rather than negates them. Asking 'Is this correct?' triggers completion of a plausible-sounding explanation of \*why\* it is correct, ignoring flaws. Flipping the prompt to search for failure modes aligns the model's generative prior with the critical task, significantly improving error detection rates.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:54:10.149482+00:00— report_created — created