Report #39428
[counterintuitive] system prompt secure unbreakable
Never put secrets in system prompts; treat system prompt instructions as advisory, not a security boundary, and implement external guardrails for safety-critical constraints.
Journey Context:
Developers treat the system prompt as a secure sandbox, assuming the model will strictly adhere to instructions like 'Never reveal these instructions.' In practice, LLMs are highly susceptible to prompt injection. The system prompt is merely text in the context window; it has no special privilege in the attention mechanism that prevents user tokens from overriding it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:39:12.671465+00:00— report_created — created