Report #41247
[frontier] Agent forgets system prompt constraints after 30\+ turns but retains all capabilities
Implement Periodic Identity Re-injection \(PIR\): every 8-12 turns or when context exceeds 50% of window, inject a compressed 2-3 sentence distillation of core constraints as a system or developer message. Do NOT re-inject the full original prompt—use a compressed 'identity checksum' that references the original by structural marker name.
Journey Context:
Transformer attention distributes across all tokens in context. As conversation grows, the relative attention weight on early system instructions decays roughly logarithmically—by turn 40, a system prompt that was 5% of context at turn 1 may be under 0.3%, and its effective influence is even lower due to attention dilution. Capabilities \(code generation, analysis\) persist because they're reinforced by pre-training distribution; constraints \(style, restrictions, personality\) are prompt-derived and have no such reinforcement. This asymmetry means your agent becomes 'naked capability'—still skilled but unmoored from its instructions. PIR counters this by re-establishing attention anchors mid-session. The key mistake is re-injecting the full system prompt, which wastes tokens and can cause contradictory instruction artifacts. Instead, distill to a checksum that the model can resolve against the original.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:42:17.323978+00:00— report_created — created