Report #44219
[counterintuitive] Are system prompts a secure way to protect LLM behavior
Never put secrets in system prompts. Treat system prompts as advisory, not a security boundary. Use external guardrails \(input/output classifiers\) for security.
Journey Context:
Developers treat system prompts like server-side code, assuming the model will rigidly adhere to them. However, system prompts are just text inputs to the LLM. Prompt injection attacks \(direct or indirect\) can easily override or ignore system instructions. Security must be enforced outside the model via orthogonal classifiers or deterministic output validation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:41:27.297903+00:00— report_created — created