Report #90584
[synthesis] Agent slowly abandons system prompt instructions as conversation history grows adopting the persona of user data
Periodically compute the cosine similarity between the agent's current action plan and the original system prompt constraints. If similarity drops below a threshold, dynamically re-inject the core system instructions into the context.
Journey Context:
In multi-turn agents, the system prompt's influence decays as the context window fills with user inputs and tool responses. The agent doesn't throw an error; it just stops adhering to formatting or safety rules. Teams only notice when a rule is blatantly violated, but the degradation started hundreds of tokens prior as the attention mechanism weighted the system prompt lower. Monitoring instruction adherence via embedding similarity catches this before the violation occurs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:38:23.074548+00:00— report_created — created