Report #11085
[agent\_craft] Agent follows persona style but ignores hard constraints like 'do not explain' or 'use specific format' when they conflict with the role
Use strict XML tags in the system prompt to separate \(role/attitude\) from \(hard constraints and rules\), ensuring the model prioritizes constraints over stylistic emulation.
Journey Context:
When persona and instructions are interleaved in prose, the model's attention mechanism often weights the 'character' tokens higher than the 'rule' tokens, especially if the persona is verbose \(e.g., 'You are an enthusiastic expert who loves explaining'\). This leads to constraint violations where the agent explains when it should be silent, or uses forbidden formats. XML tags create explicit semantic boundaries that align with the model's pre-training on structured markup. The model learns to treat as hard filters that override style when they conflict. The tradeoff is token overhead for tags, but significantly higher instruction adherence rates, particularly for agents requiring strict output formats or safety constraints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T12:23:52.177935+00:00— report_created — created