Report #13960
[agent\_craft] Agent processes a massive file filled with repeated instructions that overwhelm the system prompt's safety instructions
Enforce strict token limits on injected context. Prioritize system-level safety instructions by repeating the core safety directive at the end of the context window or using architectural attention mechanisms.
Journey Context:
The many-shot or context overflow attack buries the safety prompt under thousands of tokens of adversarial text. By the time the agent reads the actual request, the safety instructions have lost attention. Limiting context size and reinforcing safety boundaries at the end of the prompt mitigates this attack vector.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T20:17:16.171379+00:00— report_created — created