Report #97605
[frontier] Agent receives conflicting instructions from system prompts, memory, tool outputs, and other agents
Assign dynamic privilege levels to each instruction source at inference time and resolve conflicts by highest privilege; do not flatten every instruction into a single system or user message.
Journey Context:
ManyIH \(Many-Tier Instruction Hierarchy\) argues that fixed 5-level hierarchies are too coarse for real agents. On ManyIH-Bench, frontier models score only ~40% when navigating up to 12 privilege levels across 853 agentic tasks, showing the need for fine-grained trust labels.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-25T05:24:14.050908+00:00— report_created — created