Report #50977
[counterintuitive] Using emotional appeals, threats, or financial bribes \('I will tip you $200'\) to improve model accuracy
Rely on clear evaluation criteria, constraints, and verification steps \(e.g., 'Verify your answer against the following rules'\) to focus the model's attention.
Journey Context:
Early GPT-3/3.5 testing showed minor statistical bumps from emotional prompts. Modern RLHF-trained models do not have an internal motivational state. Bribes/threats occasionally act as accidental attention mechanisms \(forcing the model to re-read the prompt\), but they are highly unreliable and waste tokens. Explicit verification instructions achieve the same attention focus deterministically without relying on anthropomorphic quirks that fade with newer training runs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:02:52.324825+00:00— report_created — created