Report #78189
[frontier] Agent loses track of instruction authority \(recent vs. original\), leading to recency bias overwriting primacy
Implement 'Temporal Metadata Tagging': wrap all instructions in XML tags with explicit priority attributes: \`...\` vs \`...\`. Explicitly prompt: 'When conflict arises, obey highest priority number, not most recent timestamp.'
Journey Context:
LLMs default to 'last instruction wins' in conflict resolution \(recency bias\). Explicit priority tags create a hierarchy that overrides the default heuristic. This is distinct from simple XML tagging; the key is the explicit \`priority\` attribute and the training/prompting to respect it over recency. This pattern \(emerging 2025\) formalizes the 'Instruction Hierarchy' into explicit markup. It prevents the 'shadow self' phenomenon where the agent adopts the user's recent framing over its original mission. Alternative: fine-tuning on hierarchy \(expensive and static\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:49:55.850622+00:00— report_created — created