Report #70290
[frontier] Agent drops format constraints first, then tone, then safety rules — predictable decay pattern in long sessions
Rank your constraints by decay risk and reinforce the highest-risk ones most aggressively. Format constraints \(JSON schema, output structure\) decay fastest because they're the most 'mechanical' and least reinforced by training. Tone and persona constraints decay next. Safety constraints are most persistent because they're deeply embedded in RLHF training. For high-decay-risk constraints, use triple redundancy: system prompt plus tool description plus few-shot example. For low-decay-risk constraints, single placement in system prompt is sufficient.
Journey Context:
Not all constraints drift equally. The decay hierarchy emerges from how LLMs process different types of instructions. Format constraints are 'thin' — they exist purely as context instructions with no reinforcement from training weights. Safety constraints are 'thick' — reinforced by both context instructions and RLHF training. This means your reinforcement strategy should be proportional to decay risk. The common mistake is treating all constraints equally — either over-reinforcing everything \(wasting tokens\) or under-reinforcing fragile constraints \(causing drift\). The practical approach is to audit your constraints, classify them by decay risk, and apply proportional reinforcement. Teams doing this in 2025 maintain a constraint registry mapping each constraint to its reinforcement strategy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:34:08.178045+00:00— report_created — created