Report #25294
[gotcha] Poisoned few-shot examples altering LLM behavior
Curate and hardcode few-shot examples. If dynamic examples are necessary, strictly separate them from the instruction prompt and use delimiters.
Journey Context:
Developers use vector DBs to fetch 'similar past interactions' as few-shot examples to improve response quality. An attacker submits a benign-looking query paired with a malicious output. When this is retrieved as a few-shot example for a future user, the LLM mimics the malicious output format or behavior. Hardcoding examples or strictly delimiting dynamic ones prevents the LLM from treating them as system instructions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:51:43.232129+00:00— report_created — created