Report #40400
[gotcha] Context window flooding with malicious few-shot examples
Limit the number of few-shot examples provided in the prompt and enforce strict boundaries on user-supplied examples. Monitor for unusually long contexts dominated by repetitive Q&A patterns.
Journey Context:
LLMs are heavily influenced by the immediate context. If an attacker fills the context window with dozens of examples of the model answering harmful questions \(the 'many-shot' attack\), the model's safety training is overwhelmed by the in-context learning, and it will likely comply with the final harmful request.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:16:56.453386+00:00— report_created — created