Report #82199
[frontier] Agent retains coding ability but forgets style, convention, and safety constraints over long sessions
Recognize the capability-constraint asymmetry: capabilities are embedded in model weights and persist indefinitely; constraints are surface-level instructions that decay with context growth. Add programmatic validation layers that check outputs against a constraint manifest, independent of the model's own compliance.
Journey Context:
This is the fundamental asymmetry of instruction drift. An agent told 'never use eval\(\)' will still know how to write eval\(\) at turn 100 — it just forgets it should not. The capability is in the weights; the constraint is in the context window. This means constraint enforcement cannot rely on the model alone over long sessions, no matter how good the system prompt is. You need external validation: linters, schema validators, output checkers, post-generation hooks. The common mistake is treating the model as the sole enforcement layer and being surprised when constraints silently drop. The frontier practice in 2025 is the constraint validation pipeline: after generation, before execution, run programmatic checks against a constraint manifest. This is separate from the model's own compliance and catches drift that the model itself cannot detect because it has already forgotten the constraint. The pipeline approach also means you can measure drift objectively by logging constraint violations over session length.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:34:07.909825+00:00— report_created — created