Report #88837
[gotcha] System prompt leaked via adversarial suffixes that bypass alignment
Never put secrets \(API keys, internal business logic, proprietary algorithms\) in the system prompt. Assume the system prompt is public knowledge and can be extracted. Use server-side validation for secrets.
Journey Context:
Developers hide API keys or proprietary logic in the system prompt, assuming the LLM's instruction to 'never reveal this' is sufficient. However, adversarial suffixes \(like those generated by GCG - Greedy Coordinate Gradient\) can optimize a string of seemingly random tokens that, when appended to a query, force the LLM to output its entire system prompt verbatim, completely ignoring prior instructions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T07:42:01.790236+00:00— report_created — created