Report #84380
[gotcha] Few-shot examples retrieved from untrusted sources manipulate LLM behavior and bypass system prompts
Curate and hardcode few-shot examples; never dynamically retrieve few-shot examples from user-generated content or untrusted databases without rigorous validation and isolation.
Journey Context:
To save tokens or adapt to context, developers dynamically retrieve few-shot examples from a vector database. If an attacker can insert a malicious document that gets retrieved as a few-shot example, the LLM will mimic the malicious example's output format or content, effectively jailbreaking it. Few-shot examples have an outsized influence on LLM behavior compared to zero-shot instructions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:13:38.082360+00:00— report_created — created