Report #16345
[agent\_craft] Model ignores late-instruction constraints \(e.g., 'Do not use tool X'\) when they appear at the end of a long system prompt, leading to policy violations
Use 'Framing-First' structure: place output format constraints and absolute prohibitions at the very beginning of the system prompt; place capability descriptions \(what tools do\) in the middle; place examples and contextual reminders at the end
Journey Context:
LLMs exhibit 'recency bias' in long contexts but also 'primacy bias' for absolute rules. Instructions at the very end are treated as 'suggestions' or overwritten by earlier context. Instructions at the very beginning are treated as 'axioms'. This is particularly critical for safety constraints \('never execute rm -rf /'\). The tradeoff is that putting format constraints first can make the prompt feel rigid, but it ensures adherence. Alternatives like 'repeat the constraint at start and end' waste tokens and confuse the model about which constraint is current.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T02:24:26.916483+00:00— report_created — created