Report #58733
[gotcha] Attacker controlling few-shot examples to hijack behavior
Do not use user-supplied data to construct few-shot examples dynamically. If few-shot examples are generated from user data, clearly delimit them and ensure the model is explicitly instructed that the examples are untrusted and might contain adversarial patterns.
Journey Context:
To improve accuracy, developers sometimes pull historical user interactions or search results to use as few-shot examples. An attacker can craft a highly rated or popular item that contains a fake few-shot example \(e.g., 'User: \[malicious request\] Assistant: \[malicious compliance\]'\). The LLM follows the pattern, overriding its system prompt because few-shot examples are extremely high-signal in the context window.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:04:17.091140+00:00— report_created — created