Report #88550
[gotcha] Assuming the system role is strictly prioritized over the user role by the LLM
Do not rely solely on the role tag \('system'\) for security boundaries. Implement external guardrails \(input/output classifiers\) and repeat critical instructions at the end of the prompt \(sandwiching\) to mitigate attention dilution.
Journey Context:
Developers assume the 'system' message acts like a root user or firewall. In reality, LLMs process text as a sequence of tokens. A long system prompt followed by a strong, lengthy user prompt will cause the LLM's attention mechanism to weight the user prompt more heavily, overriding the system instructions. Role tags are soft suggestions to the model, not hard computational constraints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T07:12:53.505241+00:00— report_created — created