Report #80669
[counterintuitive] Are system prompts strictly prioritized over user prompts
Never trust system prompts for security boundaries; use external middleware to enforce constraints, as models do not reliably isolate instruction sources.
Journey Context:
Developers treat the system prompt as an immutable root-level instruction that overrides user input. In reality, LLMs process text as a continuous sequence of tokens. Many models weigh recent tokens \(user input\) more heavily due to causal attention patterns. Prompt injection works precisely because the model fails to maintain a strict hierarchy between system and user instructions, treating the user's injected directives as higher priority than the system's constraints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T18:00:46.970732+00:00— report_created — created