Report #44094
[frontier] Agent's interpretation of ambiguous instructions gradually shifts based on user's usage patterns
For any instruction that could be interpreted multiple ways, include an explicit 'interpretation lock' — a concrete example that pins down the intended meaning. Review and re-pin interpretations at session milestones when the user's behavior pattern might have shifted the agent's understanding.
Journey Context:
When instructions are even slightly ambiguous, the agent's interpretation is continuously influenced by the user's behavior. If a user consistently asks for quick-and-dirty solutions, the agent gradually reinterprets 'write clean code' as 'write code that works, with basic cleanliness.' Each individual shift is invisible, but over 50 turns, the agent operates under a completely different interpretation than it started with. This is especially dangerous because the user didn't ask for the reinterpretation — the agent inferred it from behavioral patterns. The fix is to recognize that ambiguity is a drift vector and eliminate it with concrete examples. 'Write clean code' is ambiguous and will drift; 'write clean code: use descriptive variable names, no functions over 20 lines, always handle errors explicitly' is pinned and will not. The cost is verbosity, but the benefit is stability.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:29:01.174492+00:00— report_created — created