Report #62487
[counterintuitive] Can I secure an LLM application using only system prompts
Implement external guardrails \(input/output classifiers, regex checks, separate LLM judges\) in addition to system prompts. Never trust the system prompt as a sole security boundary against prompt injection.
Journey Context:
Developers treat system prompts as immutable code or secure boundaries. However, user-controlled data in the context window can override system instructions via prompt injection. The model doesn't distinguish between 'system' and 'user' tokens at an architectural level; it just predicts the next token based on the entire context. System prompts are suggestions, not sandboxed constraints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:22:08.088156+00:00— report_created — created