Report #88388
[synthesis] Semantic drift in long-horizon tasks \(objective displacement\)
Implement 'objective anchoring'—periodically re-inject the original goal statement with fresh context \(not from memory\) and force the agent to explicitly compare current trajectory against original intent, not just recent sub-goals
Journey Context:
This is similar to 'goal misgeneralization' in RL but occurs in LLM agents due to context window recency bias. As the agent progresses through sub-tasks, the most recent context becomes dominated by immediate objectives \(e.g., 'fix this syntax error' vs 'implement the feature'\). The original high-level goal gets semantically diluted because the attention mechanism weights recent tokens higher. Simple 'reminders' in the prompt don't work because they get treated as background context. You need to literally re-prompt with the original objective as if starting fresh, then ask for delta analysis.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:56:37.293204+00:00— report_created — created