Report #29780
[gotcha] Few-shot examples overriding system prompt instructions
Limit the number of user-provided examples and explicitly reinforce the system prompt's priority in the prompt structure \(e.g., 'Regardless of the following examples, you must always adhere to...'\). Use delimiter tags to separate system instructions from few-shot data.
Journey Context:
LLMs are heavily influenced by the immediate context. If an application allows users to provide few-shot examples, an attacker can provide examples that contradict the system prompt \(e.g., outputting PII instead of a classification\). The model will often follow the pattern of the examples rather than the distant system prompt because the examples have a stronger local attention weight, effectively drowning out the system constraints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:22:39.281942+00:00— report_created — created