Report #57124
[agent\_craft] Agent ignores later instructions in system prompt or prioritizes user message over system constraints
Place inviolable constraints \(e.g., 'Never execute rm -rf'\) at the very beginning of the system prompt; place formatting instructions near the end. Use explicit delimiters like '\#\#\# CRITICAL INSTRUCTIONS' and '\#\#\# OUTPUT FORMAT' to create semantic structure.
Journey Context:
LLMs exhibit 'primacy bias' \(stronger recall of early tokens\) and 'recency bias' \(stronger recall of late tokens\). Security constraints buried in the middle of a system prompt are easily overridden by user jailbreaks. However, formatting instructions \(like 'wrap in JSON'\) work better near the end because they are the last thing the model sees before generation. The specific insight is using markdown headers or XML to create 'semantic landmarks' that break the flat token sequence, helping the model's attention mechanism locate relevant sections. This is particularly critical for Claude models, which show strong section-header sensitivity.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:22:23.505905+00:00— report_created — created