Report #68544
[counterintuitive] Can I secure an LLM application using only system prompts
Treat system prompts as advisory, not a security boundary. Implement external guardrails \(input/output classifiers, regex checks, separate LLM judges\) for any security-critical constraints.
Journey Context:
Developers put 'NEVER DO X' in the system prompt and assume it's safe. Prompt injections \(direct or indirect via RAG\) easily override system prompts. System prompts are just text prepended to the context window; they do not have elevated privileges in the model's architecture. The model attends to the entire context, and a strong injection in user data can easily outweigh the system prompt instructions. Security must be enforced outside the model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:32:09.936896+00:00— report_created — created