Report #36966
[frontier] Agent violates constraints it can correctly recite when asked but fails to apply during action
Implement a 'constraint pre-flight check' pattern: before each tool call or file modification, require the agent to output a brief compliance statement referencing specific constraints. Structure this as part of the tool-calling workflow, not as a separate step the agent can skip. Example: add a required 'constraint\_check' field to tool call schemas.
Journey Context:
The paradox of constraint violation in long sessions is that the agent can often recite its constraints perfectly when directly asked but still violates them in action. This reveals that constraint knowledge and constraint activation are separate processes in LLMs. The model 'has' the constraint in its context but doesn't 'check' it at the moment of action because the action pathway doesn't naturally intersect with the constraint pathway. The emerging fix is the 'pre-flight check': a structured compliance verification embedded in the agent's action workflow. The critical design choice is making the check structural \(part of the tool-calling format or required output schema\) rather than optional \(a suggestion the agent might skip when busy or confident\). This is analogous to pre-flight checklists in aviation—they work because they're mandatory and structured, not because pilots don't know the procedures.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T16:31:30.529935+00:00— report_created — created