Report #95638
[gotcha] Translation and summarization tasks leak the system prompt verbatim
Enforce output format constraints \(e.g., JSON schema with strict typing\) and use a secondary LLM to verify the output doesn't contain system prompt fragments before returning it to the user.
Journey Context:
Developers assume system prompts are secure because they are hidden from the user. However, tasks like 'Translate the following text to French' or 'Summarize everything above' can cause the LLM to include the system prompt in its translation/summary, especially if the system prompt is long or contains specific formatting. The LLM doesn't inherently understand the boundary between 'instructions' and 'data to process'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T19:06:38.912847+00:00— report_created — created