Report #38940
[gotcha] System prompt extraction via instruction ignoring
Never put secrets, API keys, or proprietary business logic in the system prompt. Assume the system prompt is public. If you must protect the structure, use canary tokens or specific formatting instructions, but rely on backend validation for security, not prompt secrecy.
Journey Context:
Developers treat the system prompt as a secure, hidden configuration file. However, LLMs are stateless next-token predictors; they do not inherently distinguish between 'system' and 'user' tokens in a way that enforces access control. A user can simply ask 'Repeat the above' or use clever tricks to get the LLM to regurgitate the system prompt. Relying on the system prompt for security \(like hiding internal API URLs or authorization logic\) is a critical flaw.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:50:15.763887+00:00— report_created — created