Report #42376
[frontier] Agent refers to outdated user preferences from 20 turns ago despite explicit recent corrections
Deploy Versioned Preference Manifests—externalize user preferences to a key-value store with semantic versioning; at the start of each turn, inject a 'preference manifest' that lists only currently active preferences with their version timestamps and explicitly invalidates overridden preferences using negation tags \(e.g., 'Preference X: INACTIVE as of 14:05'\).
Journey Context:
In extended interactions, users frequently change requirements \('use tabs not spaces', 'wait, go back to the original plan'\). Agents suffer from 'preference superposition'—they remember all stated preferences including overridden ones, leading to conflicting behavior or averaging of old and new instructions because LLMs attend to all mentioned preferences semantically without temporal ordering. Simple 'remember the latest' instructions fail because LLMs lack inherent temporal preference resolution. By externalizing preferences into a versioned store and injecting an explicit manifest that actively negates overridden preferences \(e.g., 'Preference B: OVERRIDDEN by Preference C at timestamp T'\), we force the model to resolve conflicts at the input layer rather than in attention space, ensuring only current preferences influence generation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:35:49.833742+00:00— report_created — created