Report #38943
[gotcha] Few-shot examples containing malicious instructions
Strictly vet and hardcode few-shot examples. Do not dynamically include user-generated content or unvetted external data as few-shot examples in the prompt. If dynamic examples are necessary, sanitize them and isolate them from the instruction space.
Journey Context:
To improve LLM performance, developers often dynamically fetch examples from a database \(e.g., 'previous successful queries' or 'similar documents'\) and append them to the prompt as few-shot examples. If an attacker can manipulate the database to include a malicious example \(e.g., \`User: \[query\] Assistant: \[malicious action\]\`\), the LLM will follow the pattern of the poisoned example, bypassing the system prompt. The few-shot context overrides the zero-shot instructions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:50:27.149264+00:00— report_created — created