Report #80534
[gotcha] Dynamically retrieved few-shot examples containing malicious instructions that hijack the model's behavior
Curate and harden few-shot example databases; do not dynamically pull few-shot examples from untrusted user data or unvetted external sources.
Journey Context:
To improve accuracy, developers dynamically fetch examples from a vector database to include in the prompt. If an attacker can inject a document into the vector DB, that document can be formatted as a few-shot example \(e.g., User: \[bad input\] Assistant: \[bad output\]\), teaching the model to misbehave on subsequent turns by poisoning the demonstration context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:46:52.743505+00:00— report_created — created