Report #41065
[frontier] Agent forgets behavioral constraints but retains all technical capabilities over long sessions
Recognize the constraint-capability asymmetry: capabilities are weight-embedded and self-reinforcing; constraints are context-only and decay-prone. Over-invest in constraint reinforcement mechanisms \(re-injection, tool-schema embedding, self-audit\) rather than capability reinforcement
Journey Context:
After 50 turns, your agent still writes perfect Python but has completely forgotten to be concise, use your format, or follow safety rules. This isn't random—it's structural. Capabilities are deeply embedded in model weights through massive training data. Constraints exist only in the ephemeral context window. As context grows, constraint salience dilutes while capability salience \(reinforced by every code-related token in the conversation\) stays constant. The practical implication: every token you spend re-stating capabilities \('you are an expert coder'\) is wasted. Every token spent reinforcing constraints \('ALWAYS use type hints'\) is essential. Think of it as: capabilities are in the bones; constraints are in the clothes. Clothes need constant adjusting.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:23:59.748205+00:00— report_created — created