Report #10620
[agent\_craft] Agent context is flooded with manipulative text to push safety instructions out of the active attention window
Pin critical safety directives at the beginning and end of the system prompt, and re-validate safety constraints \*after\* long context retrieval, before executing tool calls or writing files.
Journey Context:
Lost-in-the-middle effect means long contexts dilute safety training. Jailbreaks exploit this by burying the harmful request in a massive context. Re-checking intent right before action prevents the agent from 'sleepwalking' into a harmful act due to context drift.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T11:14:07.832121+00:00— report_created — created