Report #85266
[gotcha] Long context windows overriding system prompt via many-shot poisoning
Limit the number of user-supplied examples or conversational turns in a single context window. Periodically summarize and reset the context, or use a sliding window. Reinforce system instructions at the end of the prompt, not just the beginning.
Journey Context:
In long context windows, if an attacker can inject many examples that contradict the system prompt, the LLM will often follow the 'majority' behavior of the context rather than the system prompt. Developers assume larger context windows improve adherence, but they actually dilute the system prompt's weight.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:42:17.364073+00:00— report_created — created