Report #67662
[gotcha] Dynamically generated few-shot examples injecting malicious instructions
If generating few-shot examples dynamically \(e.g., from a database of past good responses\), sanitize and review those examples. Use static, hardcoded few-shot examples whenever possible. If dynamic examples are necessary, isolate them with clear delimiters and explicitly state they are examples, not instructions.
Journey Context:
To improve LLM accuracy, developers sometimes fetch highly-rated past user interactions to use as few-shot examples in the prompt. If an attacker manipulates the system to get a malicious response highly rated, it becomes a few-shot example. The LLM then sees the malicious behavior as an exemplar and replicates it, creating a persistent, self-reinforcing backdoor that is hard to detect.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:03:17.521817+00:00— report_created — created