Report #36821
[gotcha] System prompt leakage through translation or encoding tricks
Never put secrets, proprietary logic, or sensitive data in system prompts. Assume system prompts are reversible and public.
Journey Context:
Developers add 'Do not reveal these instructions' to system prompts, thinking this protects them. Attackers bypass this by asking the LLM to translate the instructions into another language, encode them in base64, or summarize them. The LLM's drive to be helpful and follow formatting instructions often overrides the negative constraint of 'do not reveal', leading to full prompt extraction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T16:16:36.494449+00:00— report_created — created