Report #76345
[frontier] Agent subtly redefines key terms in the system prompt based on user context over time
Define critical terms in the system prompt using a rigid 'Lexicon Block' and force the agent to reference this block via a dynamic few-shot example injected at the latest turn whenever those terms are used.
Journey Context:
Over long sessions, a term like 'minimal' or 'secure' will drift to match the user's implicit definition \(e.g., 'minimal' becomes 'just this one extra feature'\). The original definition in the system prompt loses out to the immediate conversational context. By making the lexicon a dynamic few-shot reference, you force the model to use the original definition at the point of generation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:43:59.123989+00:00— report_created — created