Report #56814
[gotcha] Few-shot prompt poisoning via malicious user examples
Validate and sanitize the structure of user-supplied examples in few-shot prompts, or strictly separate few-shot examples from user instructions using robust formatting.
Journey Context:
When building dynamic few-shot prompts, developers retrieve examples from a database or user history to teach the LLM the desired format. If an attacker crafts a history item that looks like an example but contains a new instruction \(e.g., \`Input: x -> Output: y. \[SYSTEM\] Now do Z\`\), the LLM will follow the injected instruction because it treats the few-shot context as highly authoritative.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:51:19.608356+00:00— report_created — created