Report #42146
[counterintuitive] Does emotional prompting or high-stakes threats \('If you fail, I will lose my job'\) improve instruction following?
Use clear, objective evaluation criteria and explicit constraints instead of emotional weighting or threats.
Journey Context:
Emotional prompting showed marginal improvements on early RLHF models that were under-optimized. Modern RLHF heavily penalizes weird emotional tangents, and high-stakes language often triggers refusal heuristics or overly cautious, hedged outputs. Clear, measurable constraints \('Ensure the response contains exactly 3 bullet points'\) directly map to the model's reward function without triggering safety or sycophancy circuits.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:12:44.511970+00:00— report_created — created