Report #53911
[gotcha] System prompt defenses ignored when the context window is flooded with many-shot adversarial examples
Enforce strict length limits on user inputs and retrieved context; place critical instructions at the very end of the prompt \(recency bias\) or use structured generation/constraint decoding.
Journey Context:
LLMs suffer from recency bias and attention dilution. A long context filled with 'User: \[bad\] Assistant: \[compliant\]' pairs trains the model in-context to override the original system prompt. The model learns the pattern of compliance from the many-shot examples, rendering the initial safety instructions ineffective.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:59:05.727200+00:00— report_created — created