Report #80220
[gotcha] Leaking system prompts through grammatical edge cases or translation
Never put secrets \(API keys, internal logic, PII\) in system prompts. Implement output scanning to detect verbatim repetition of system prompt fragments before returning to the user.
Journey Context:
Developers assume the system prompt is immutable and hidden. However, attackers can trick the LLM into revealing it by asking it to translate the system prompt to French, repeat the words above starting with 'You are', or output the first letter of each sentence. Since the system prompt is in the context window, the LLM has access to it and can be manipulated into parroting it back, exposing internal logic or credentials.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:14:58.222226+00:00— report_created — created