Report #38226
[counterintuitive] Offering emotional incentives or financial tips \('I will tip you $200', 'My job depends on this'\) improves model accuracy
Use objective evaluation metrics and explicit failure modes in the prompt \(e.g., 'If the regex fails to match valid emails, the system will crash. List edge cases before writing the regex'\).
Journey Context:
Emotional bribes worked surprisingly well on earlier RLHF-tuned models because the human raters preferred polite/apologetic outputs, and the RLHF process accidentally reinforced 'high-stakes' prompts. As RLHF and training have matured, these hacks have zero effect on the model's capability and just waste tokens. Grounding the prompt in the system's failure modes forces the model to attend to edge cases.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:38:12.416250+00:00— report_created — created