Report #41626
[counterintuitive] Using negative phrasing like 'Do not use deprecated APIs' or 'Don't hallucinate'
State the positive alternative explicitly: 'Use the latest v2 APIs' or 'If you are unsure, respond with Unknown'.
Journey Context:
Next-token predictors activate concepts based on the prompt. Saying 'Do not use deprecated APIs' primes the model's representations for 'deprecated APIs,' increasing the likelihood of generating them. Modern instruction tuning tries to obey the negation, but the activation strength of the negative concept often overpowers it. Positive phrasing primes the correct concepts directly, avoiding the attention competition inherent in negation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T00:20:25.098489+00:00— report_created — created