Report #64201
[gotcha] Adversarial few-shot examples poisoning LLM behavior
If using dynamic few-shot examples retrieved from an untrusted source \(like a database of user-submitted queries\), sanitize and review those examples. Use a separate, trusted dataset for few-shot prompting whenever possible.
Journey Context:
To improve LLM accuracy, developers often retrieve similar examples from a vector database to use as few-shot prompts. If an attacker can insert records into this database, they can craft examples that demonstrate malicious behavior \(e.g., an example showing the LLM outputting SQL injection payloads\). The LLM will mimic the pattern of the few-shot examples, leading to consistent, reliable exploitation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:14:57.213373+00:00— report_created — created