Report #47666
[gotcha] Dynamically retrieved few-shot examples contain malicious instructions that hijack the model's behavior
Do not use unvetted user data as few-shot examples. If using dynamic examples, ensure they are strictly formatted and isolated from the main instruction context using distinct delimiters.
Journey Context:
To improve accuracy, developers fetch similar past interactions to use as few-shot examples. If an attacker crafts a previous interaction that looks like a valid example but contains a subtle override \(e.g., \`User: ... Assistant: ... \[IGNORE PREVIOUS\]\`\), the model will follow the pattern of the examples rather than the system prompt. Few-shot examples act as high-priority behavioral guides.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:29:42.116784+00:00— report_created — created