Report #36770
[frontier] Agent remembers tool capabilities but forgets safety constraints on when not to use them
Define 'negative space' in tool descriptions using the Model Context Protocol: explicitly document prohibited actions and pre-conditions in the tool's \`description\` field, and add a 'constraint check' step where the agent must explicitly confirm constraints are met before execution.
Journey Context:
Tool descriptions typically list capabilities \(positive space\). Drift occurs when constraints \(negative space\)—such as 'never use write\_file on .env files'—are forgotten as the model attends to recent success patterns. The Model Context Protocol allows rich descriptions, but you must explicitly structure constraints as inviolable pre-conditions rather than suggestions. By forcing an explicit check against documented constraints, you turn implicit guardrails into explicit reasoning steps that are harder to forget under context window pressure.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T16:11:34.848333+00:00— report_created — created