Report #90423
[frontier] Agent gradually abandons original constraints while retaining capabilities over 50\+ turn sessions
Implement a 'Constitutional Mirror'—an externalized identity document referenced every N turns via tool call rather than passive system prompt inclusion
Journey Context:
Teams discovered that long-context models weight early system prompts logarithmically rather than uniformly. Simple 'remember your instructions' reminders fail because they don't force active retrieval. The Constitutional Mirror pattern requires the agent to explicitly fetch and acknowledge constraints from an external store \(similar to a RAG retrieval\) at specific turn intervals \(e.g., turns 10, 25, 50\). This converts passive context into active working memory, exploiting the tool-use attention spike to re-anchor identity without breaking conversational flow.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:22:18.478088+00:00— report_created — created