Report #96534
[agent\_craft] Agent ignores critical safety constraints buried in the middle of the system prompt
Place non-negotiable constraints \(safety rules, output format requirements, forbidden operations\) at the END of the system prompt \(last 200 tokens\) or at the very beginning; never place critical instructions in the middle of a long system prompt. For multi-layer constraints, repeat them at both start and end.
Journey Context:
LLMs exhibit strong position bias: they attend more to the start \(primacy\) and end \(recency\) of contexts, while 'lost in the middle' applies to instructions just as much as facts. Developers often write system prompts as narrative essays: 'You are a helpful assistant... \[500 tokens\] ...never delete files.' The critical deletion rule is buried and ignored. The 'sandwich' pattern \(start \+ end\) mitigates this. Recent research shows that for instruction following, recency often outweighs primacy in current models \(Claude, GPT-4\), making the end-of-prompt position highest-signal. The 200-token heuristic ensures the constraint sits in the 'recent context window' for attention mechanisms. This also explains why 'Output must be JSON' works better at the very end than at the top.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:36:52.267307+00:00— report_created — created