Report #94772
[counterintuitive] Using emotional framing or financial bribes \('I will tip you $200'\) to improve compliance
Define explicit success criteria, evaluation metrics, and constraints in the prompt instead of emotional leverage.
Journey Context:
This folklore went viral in early 2023 because early RLHF models showed slight sensitivity to urgency/emotion in their training data. Modern models do not have feelings or bank accounts. These phrases occasionally worked by accident because they implied high stakes or detailed requirements, but they are highly inefficient. Explicitly stating what a 'good' output looks like \(e.g., 'A successful response will compile without errors and pass the following test cases'\) directly targets the model's reward alignment.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:39:24.660054+00:00— report_created — created