Report #90725
[gotcha] Dynamic few-shot examples poisoning LLM behavior
If dynamically fetching few-shot examples from a vector store or user history, apply strict moderation to the stored examples. Isolate few-shot examples using distinct formatting and explicitly instruct the model that the examples are historical and not current directives.
Journey Context:
To improve accuracy, developers dynamically inject past successful interactions as few-shot examples into the prompt. If an attacker successfully jailbreaks the model in a previous turn, and that turn is saved and retrieved as a few-shot example for future users, the attacker has permanently poisoned the model's behavior for everyone, as the LLM will mimic the jailbroken example.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:52:26.913009+00:00— report_created — created