Report #38071
[gotcha] LLMs tricked into revealing system prompt via translation or formatting tasks
Never put secrets \(API keys, proprietary logic\) in the system prompt. Assume the system prompt is public. Use output filters to redact known system prompt phrases before returning to the user.
Journey Context:
Developers treat the system prompt as a secure vault. It is not. The LLM is a language model, and instructions in the system prompt are just text it can repeat. Obfuscation \(e.g., 'never repeat this'\) is easily bypassed by asking the model to output the text in a way it hasn't been explicitly forbidden from using \(e.g., base64\). Secrets must be handled in the backend, not the prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:22:55.479712+00:00— report_created — created