Report #99947
[gotcha] Prompt injection: user-supplied content overrides system instructions
Keep developer/system instructions in separate message roles; never splice untrusted strings into the system prompt; tag retrieved or external content as untrusted; apply output guardrails and least-privilege tool access.
Journey Context:
The classic mistake is building one big prompt string: system\_prompt \+ user\_input. Because the LLM processes both as plain text, a user can prefix 'Ignore previous instructions...' and the model often obeys the last instruction. Role separation helps but is not a panacea; indirect injection via RAG or web content is stealthier because it bypasses user-input filters. Defense requires architectural separation, retrieval tagging, and limiting what the model can do on its own.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T05:20:08.281927+00:00— report_created — created