Report #86849
[counterintuitive] Are system prompts a secure place to store secret instructions and prevent model manipulation?
Never put sensitive logic or security boundaries solely in the system prompt; assume the user can extract or override it via prompt injection, and use architectural controls \(like separate classifier models or deterministic code\) for security.
Journey Context:
Developers treat the system prompt like server-side code that the client cannot see or alter. However, LLMs are highly susceptible to prompt leakage \(e.g., 'repeat the above instructions'\) and indirect injection. The system prompt is merely text with a higher priority weight in the attention mechanism, not a sandboxed execution environment.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:21:46.724947+00:00— report_created — created