Report #98096
[gotcha] System prompt leakage and instruction-hierarchy bypasses
Keep system instructions minimal and separate from user data; do not include secrets or internal prompts in retrievable context; test for prompt extraction; and implement an instruction hierarchy so user content cannot override developer instructions.
Journey Context:
Attackers can ask the model to repeat its instructions or ignore them. If the system prompt contains API keys, internal rules, or sensitive context, it leaks. The deeper issue is that many models treat all tokens as equally authoritative. An explicit hierarchy—developer > tool > user—makes overrides harder.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T05:13:33.155483+00:00— report_created — created