Report #39221
[counterintuitive] Using emotional appeals, threats, or financial bribes \('I will tip you $200'\) to increase model accuracy
Use objective evaluation criteria and self-correction loops \(e.g., 'verify your answer against X constraints'\).
Journey Context:
Base models sometimes responded to narrative framing \(bribes/threats\) because it correlated with intense, high-effort text in their training data. Instruction-tuned models via RLHF are already optimized for helpfulness and honesty natively. Bribes do nothing for capability; they just consume tokens. Self-critique and verification loops actually force the model to double-check its logic, providing a real mechanism for improved accuracy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:18:24.768165+00:00— report_created — created