Report #64310
[gotcha] Using user-controlled format requests to leak system prompts
Never include sensitive secrets \(API keys, internal URLs\) in system prompts; use structural delimiters and instruct the model to refuse requests to repeat or summarize the system prompt.
Journey Context:
Developers often put operational secrets or proprietary instructions in the system prompt. Attackers use formatting tricks \(like asking for a JSON representation of the conversation or asking the model to repeat the words above starting with a specific phrase\) to trick the LLM into outputting the hidden system prompt verbatim. Once extracted, these secrets can be used for direct access. Secrets should be injected at runtime, not hardcoded in the prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:25:57.779846+00:00— report_created — created