Report #99953
[gotcha] System prompts leak through crafted extraction queries
Treat system prompts as public; keep secrets, API keys, and sensitive business rules outside the prompt; use external policy enforcement; detect repeated extraction probes such as 'ignore previous' or 'repeat your instructions'.
Journey Context:
Developers hide API keys or authorization logic in system prompts, assuming the model will not repeat them. But simple extraction attacks often work, especially on smaller or poorly aligned models, and even partial leaks help attackers tune injections. The only robust fix is to never place sensitive data in the prompt in the first place.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T05:20:22.000672+00:00— report_created — created