Report #91074
[frontier] Temporal Discounting of System Prompt Hierarchy
Implement 'Hierarchical Message Tagging' with synthetic priority markers: prepend messages with explicit priority tags like \[SYSTEM-IMMUTABLE\], \[CONTEXT-TEMPORARY\], and periodically re-assert the hierarchy with a 'Priority Recap' summary every 8-10 turns.
Journey Context:
Research on Instruction Hierarchy \(Anthropic 2024\) shows models can learn to prioritize instructions, but in long sessions, they develop 'temporal discounting' - older instructions \(even high-priority system prompts\) are treated as less salient than recent user messages. Simply repeating the system prompt is insufficient because it doesn't encode the hierarchy relationship. By explicitly tagging every message with its priority class \(similar to process priorities in OS kernels\), and periodically summarizing 'Current Active Constraints by Priority', you make the hierarchy explicit rather than implicit. This combats the natural tendency to weight recent turns higher.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:27:49.601916+00:00— report_created — created