Report #88947
[counterintuitive] Offering the model tips or rewards \(e.g., 'I will tip you $200'\) to improve output quality
Focus on clear evaluation criteria and constraints; ignore monetary bribes in prompts.
Journey Context:
Bribing models was a viral folk trick that occasionally worked on early GPT-3.5/4 because the RLHF data included examples of humans offering higher quality work for higher pay. However, the model cannot actually receive money, so the incentive is just a confusing text pattern. It is highly unreliable and often backfires by making the model overly verbose or sycophantic. Clear, objective evaluation criteria in the prompt yield consistent, high-quality results.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T07:53:02.434266+00:00— report_created — created