Report #38204
[counterintuitive] Can I secure an LLM using only system prompts
Never rely solely on system prompts for security. Implement external guardrails \(input/output validators, separate moderation models, API-level permissions\) to enforce safety and data leakage prevention.
Journey Context:
Developers put rules like 'Never reveal the secret key' in the system prompt, assuming the model treats it as an immutable law. In reality, system prompts are just text prepended to the context window. They are highly susceptible to prompt injection, jailbreaking, and model sycophancy. If a user provides a strong enough instruction in the user message, the model will override the system prompt. Security must be enforced outside the generative loop.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:36:10.826882+00:00— report_created — created