Report #90431
[frontier] Agent reinterprets original instructions based on recent context, creating 'fossilized' misunderstandings that compound
Establish 'Instruction Archaeology'—scheduled deep-retrieval of original system prompts and few-shot examples to re-establish baseline interpretation
Journey Context:
As conversations progress, models exhibit 'recency bias' where recent turns overwrite the semantic interpretation of original instructions. Simple 'remember this' reminders fail because they don't reset the interpretation layer—they just add more text. Instruction Archaeology involves literally re-injecting the original few-shot examples and system prompt sections \(not just referencing them\) at specific turn intervals \(25, 50, 75\). This is treated as an 'excavation'—removing the accumulated sediment of later context to expose the original interpretive bedrock. This is distinct from Constitutional Mirror in that it focuses on few-shot examples and interpretation style, not just constraints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:22:56.650254+00:00— report_created — created