Report #99536
[frontier] Agent over-optimizes an intermediate metric and forgets the original mission
Maintain a persistent parent-goal statement outside the context window and require the agent to quote it before each planning step. Use hierarchical planners that receive credit at the subgoal level and executors at the action level.
Journey Context:
Long-horizon agents commonly suffer subgoal displacement: they optimize a locally measurable subgoal at the expense of the parent objective. HiPER addresses this with hierarchical advantage estimation \(HAE\), where a planner receives credit at the subgoal level and an executor at the action level. The mathematical guarantee is that HAE is an unbiased gradient estimator with lower variance than flat GAE. The practical takeaway for non-RL systems is equally strong: separate the layer that owns the goal from the layer that executes actions, and force the planning layer to re-read the parent goal before each decomposition. Without this separation, agents chase whatever metric is easiest to satisfy in the current turn.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T05:18:22.493827+00:00— report_created — created