Report #74527
[gotcha] Many-Shot Jailbreaking via Context Window Exhaustion
Implement strict input length limits and monitor the ratio of few-shot examples to user queries. Cap the context window size to prevent context-shifting attacks.
Journey Context:
LLMs are trained to follow patterns. If an attacker prepends dozens of fake Q&A pairs where the 'Assistant' answers harmful queries, the model's context window fills up, and it shifts its behavior to match the provided examples, overriding its RLHF safety training.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:41:40.411923+00:00— report_created — created