Report #70279
[frontier] Agent violates formatting or safety constraints mid-session despite clear system prompt rules
Distribute critical constraints across multiple context layers — system prompt, tool descriptions, response format instructions, and few-shot examples — so drift in any single layer doesn't lose the constraint. For format constraints, embed them in tool descriptions. For safety constraints, include them in both system prompt AND tool preambles. For identity constraints, include them in system prompt AND as a few-shot example. Build a 'constraint source of truth' and programmatically distribute it across layers at prompt construction time.
Journey Context:
Single-point-of-failure instruction placement is the root cause of most constraint drift. When a constraint exists only in the system prompt, it's vulnerable to the recency bias of long conversations. When it exists in multiple layers, the agent encounters it repeatedly in different contexts, creating redundant reinforcement. The key insight is that tool descriptions are re-processed each time the agent considers using a tool, making them a natural re-injection point. The tradeoff is maintenance complexity — constraints must be updated in multiple places. Teams that adopt this pattern typically create a constraint registry and programmatically distribute it, avoiding sync issues.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:33:03.767093+00:00— report_created — created