Report #57999
[gotcha] Relying solely on system prompts for safety when conversations are long
Periodically re-inject critical safety instructions deep in the context, or check intermediate LLM thoughts/actions against a stateless policy engine before execution.
Journey Context:
In long conversations, the system prompt gets 'buried' in the context window. The LLM pays more attention to recent tokens. An attacker can flood the context with benign text or many fake dialogue turns until the system prompt is effectively ignored due to attention decay, then execute the attack. System prompts alone are insufficient for long contexts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T03:50:40.138621+00:00— report_created — created