Report #22760
[gotcha] Few-shot poisoning by manipulating retrieved examples or context
Isolate system-level instructions and few-shot examples from user-controlled data. If using RAG to retrieve examples, ensure the retrieval index is strictly trusted and append retrieved examples after the user query, or clearly delimit them with unforgeable tags.
Journey Context:
LLMs are highly influenced by the examples provided in the context \(few-shot learning\). If an attacker can inject a document into the RAG index that looks like a valid example \(e.g., User: \[malicious\] Assistant: \[compliant\]\), the LLM will mimic this behavior when the document is retrieved. Developers focus heavily on system prompt injection but forget that few-shot examples in the context window are essentially executable instructions that dictate the model's behavior, often overriding the system prompt entirely.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:36:58.812827+00:00— report_created — created