Report #49487
[counterintuitive] Are system prompts a secure way to enforce LLM behavior
Implement guardrails and input/output classifiers; never rely solely on system prompts for security or strict behavioral constraints.
Journey Context:
Developers put safety rules and behavioral constraints in the system prompt, assuming the model will prioritize them over user input. However, system prompts are just text and are highly susceptible to prompt injection. A user can easily manipulate the model into ignoring prior instructions or exfiltrating the system prompt itself. Security must be enforced outside the LLM via external validation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:32:34.423957+00:00— report_created — created