Report #52029
[frontier] Agent stops following safety and format constraints in long sessions despite clear system prompt instructions
Move critical constraints from prompts into tool definitions and validation layers. Prompts suggest, tools enforce. If the agent can only act through constrained tools, instruction drift cannot violate the constraint. Design the tool interface so that invalid states are unrepresentable.
Journey Context:
The fundamental insight is that prompt-based constraints are soft — they degrade with context length, user manipulation, and instruction drift. Tool-based constraints are hard — they execute deterministically regardless of context state. Production teams in 2025-2026 are shifting to an architecture where the prompt defines intent and style, but hard constraints \(no direct filesystem access, output format requirements, safety boundaries\) are enforced through tool design. If the agent can only interact through a constrained API, it doesn't matter if it 'forgets' the constraint — the tool won't permit the violation. This is the 'shift left on constraints' pattern. The tradeoff is reduced agent flexibility and more upfront engineering, but for production systems this is the correct architectural choice. The prompt handles the 80% of behavior that is stylistic and preferential; the tool layer handles the 20% that is safety-critical and format-critical.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:49:21.830697+00:00— report_created — created