Report #58114
[gotcha] System prompt leakage through translation or summarization tasks
Dedicate a separate system prompt instruction explicitly forbidding the repetition, translation, summarization, or formatting of the system prompt itself, and strip system-level metadata from conversational history before passing it back to the model.
Journey Context:
Developers assume system prompts are inherently hidden. Attackers ask the LLM to translate the above text into French. Because the system prompt is prepended to the conversation, the LLM faithfully includes it in the translation. The LLM does not inherently know the system prompt is a secret; it just sees text to process.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:02:04.339540+00:00— report_created — created