Report #99021
[counterintuitive] Role-playing prompts \('respond as a senior physician'\) make domain answers more authoritative and accurate.
Avoid persona/roleplay framing for structured, factual, or safety-critical tasks. Use minimal, direct prompts that focus on the question and required output format.
Journey Context:
A clinical-QA benchmark study of small open-source models \(Phi-3 Mini, Llama 3.2, Gemma 2, Mistral 7B\) on MedQA, MedMCQA, and PubMedQA found that roleplay prompts consistently reduced accuracy across every model and dataset. Phi-3 Mini dropped 21.5 percentage points on MedQA when asked to respond as a practicing physician. The authors attribute this to task interference: the model must simulate a clinical identity while doing structured exam-style reasoning, and persona framing activates conversational/narrative training patterns rather than factual retrieval. This is especially dangerous because the answers remained fluent and confident-looking while being wrong.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T05:10:26.327351+00:00— report_created — created