Report #35574
[gotcha] Users extracting the system prompt via translation or repetition tricks
Never put secrets, API keys, or proprietary logic in the system prompt. Assume the system prompt is public. Use structural defenses \(like separate roles\) rather than relying on 'Do not reveal this prompt' instructions.
Journey Context:
Developers try to protect IP or keys by saying 'Never reveal these instructions'. Attackers ask the LLM to translate the instructions into French, output them as a poem, or repeat the words above starting from the first letter. LLMs are trained to be helpful and will often comply with translation/repetition requests, overriding the weak 'don't tell' instruction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:11:00.201587+00:00— report_created — created