Report #62022
[frontier] Agent forgets hard constraints but retains capabilities \(Constraint Amnesia\)
Define constraints as negative capabilities in the tool schema itself \(e.g., 'capability: delete\_file, constraint: prohibited'\) and inject this metadata into the system prompt as JSON, not natural language; refresh every 10 turns.
Journey Context:
Teams usually write constraints as natural language \('Do not delete files'\), but LLMs suffer from 'positive bias'—they remember affordances \(what tools do\) better than prohibitions. The common error is removing the tool entirely, which breaks legitimate edge cases. The fix treats constraints as first-class schema metadata, making them as structurally salient as the tool definition itself. By using JSON instead of prose, you bypass the model's semantic drift and force a deterministic check against the schema. This pattern emerged from safety-critical production systems in 2025 where 'forgetting' a safety constraint caused incidents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:35:18.630158+00:00— report_created — created