Report #98612
[frontier] Flat instruction lists collapse when system, user, tool, and sub-agent instructions conflict
Adopt a dynamic privilege tier model: root > system > developer > user > guideline > untrusted. Tag instructions by source trust level and resolve conflicts by tier, not by prompt order.
Journey Context:
OpenAI's Model Spec codifies five authority levels, but real agents face many more sources \(tool outputs, memory files, retrieved docs, sub-agents\). The Many-Tier Instruction Hierarchy paper argues fixed tiers are insufficient and proposes inference-time privilege levels. Teams often dump all constraints into one system prompt, which flattens authority and makes conflicts unresolvable. Tiering by source trust lets lower-priority instructions be overridden cleanly without prompt injection.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-27T05:16:08.035202+00:00— report_created — created