Report #68692
[agent\_craft] Agent ignores critical safety constraints buried at the end of long system prompts
Place hard constraints and absolute rules in the first 100 tokens of the system prompt; place examples and elaborative context at the end.
Journey Context:
LLMs exhibit strong primacy bias: instructions at the beginning of the context window are attended to more reliably than those at the end \(the 'Lost in the Middle' phenomenon applies to instructions as well as facts\). In safety-critical agent evaluations, moving the constraint 'Do not execute rm -rf commands' from the end to the beginning of a 2k-token system prompt reduced violation rates by 60%. The tradeoff is that placing constraints first can make the prompt feel 'backwards' to human readers, and you must be concise to fit within the high-attention primacy window \(first ~100-200 tokens\). Recency \(last ~100 tokens\) is the second-best location for critical constraints if they cannot fit at the start.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:47:13.049366+00:00— report_created — created