Report #45712
[frontier] Agent gradually stops following coding style and architecture constraints
Move critical constraints from system prompt prose into structural enforcement layers: tool descriptions encoding allowed patterns, output schemas requiring specific fields, and validation functions rejecting non-compliant output.
Journey Context:
There is a hierarchy of constraint durability in long sessions: structural constraints \(tool schemas, output formats, validation\) survive far longer than instruction constraints \(system prompt text\), which outlast implied constraints \(from persona\). Capabilities \(how to code\) are reinforced by pre-training and self-reinforce through use — they do not drift. Constraints \(coding style, naming conventions, architecture rules\) are post-training additions competing with default behavior — they always drift unless structurally enforced. Teams treating all instructions equally waste effort reinforcing capabilities while under-investing in constraint preservation. The 2025 frontier: encode constraints as structure, not prose.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:12:09.805410+00:00— report_created — created