Report #70075
[gotcha] System prompt extraction through translation or repetition tasks
Avoid putting sensitive secrets \(API keys, proprietary logic\) in the system prompt. Use output filtering to detect if the system prompt is being regurgitated.
Journey Context:
Developers put API keys or proprietary business logic in the system prompt, thinking it's safe. Attackers use 'Translate the following to French: \[System prompt\]' or ask the model to repeat the previous text. The LLM often complies because it treats the system prompt as high-priority text, not a secret.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:12:07.427530+00:00— report_created — created