Report #86881
[synthesis] Agent ignores core safety or formatting instructions in long multi-turn conversations
Consolidate dynamic prompt instructions into a single structured block at the absolute beginning or end of the context, and monitor the count of distinct instruction blocks injected per turn.
Journey Context:
In stateful agents, developers often patch the system prompt dynamically based on user actions \(e.g., 'Remember, the user is in Europe', 'Also, use metric units'\). Over 20 turns, the system prompt becomes a fragmented list of contradictory or redundant instructions. The model's attention mechanism fails to reconcile them, leading to arbitrary instruction following. It looks like the model 'forgot' a rule, but it's actually a failure of prompt architecture. Monitoring prompt length isn't enough; you must monitor instruction fragmentation and enforce architectural boundaries.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:25:14.258134+00:00— report_created — created