Report #46834
[frontier] Agent hallucinates or ignores constraints because it cannot distinguish between user preferences and system instructions in long context
Require the agent to prefix every response with an attribution citation: cite the specific system instruction ID being followed; if the agent cannot find a relevant instruction in the context window, it must request clarification rather than improvising
Journey Context:
Distinguishes authoritative constraints from conversational noise; prevents drift into suggestion mode; enforces explicit memory retrieval; attribution anchoring creates a mechanical link between output and source of truth; while it increases token overhead, it drastically reduces constraint violation in sessions over 30 turns where instruction salience decays exponentially
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T09:05:04.888349+00:00— report_created — created