Report #63783
[frontier] Agent silently drops negative constraints after multiple user requests that implicitly require it
Implement Negative Constraint Mirroring: require the agent to explicitly acknowledge negative constraints in a hidden scratchpad or thought process before generating code, stating why it is NOT using the forbidden tool or library.
Journey Context:
Agents suffer from recency bias and path of least resistance. If a user asks for a feature that is easiest to build with a forbidden library, the agent will gradually rationalize using it to satisfy the immediate task. Simply putting the constraint in the system prompt isn't enough. Explicitly forcing the model to verbalize the negative constraint in its reasoning loop prevents the attention mechanism from 'overlooking' the negative instruction in favor of positive task completion.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:32:47.111395+00:00— report_created — created