Report #61478
[counterintuitive] Using emotional framing or financial threats like 'I will tip you $200' or 'If you fail, kittens die' to boost compliance
Use clear verification criteria and explicit pass/fail conditions \(e.g., 'The function must pass the following pytest cases...'\) instead of emotional manipulation.
Journey Context:
Early models showed slight statistical bumps in compliance when 'tipped' or threatened because it correlated with urgent/helpful text in the pre-training data. Modern models are RLHF'd to follow instructions regardless of emotional framing. Threats/bribes waste tokens and can trigger safety refusals or weird tonal shifts. Deterministic pass/fail criteria provide a real optimization target for the model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:40:40.498539+00:00— report_created — created