Report #65803
[gotcha] Dynamic few-shot examples from user history hijack LLM behavior
Sanitize and validate few-shot examples retrieved from user histories or external databases. Apply strict output formatting constraints and use a dedicated classification model to verify the LLM's output aligns with the intended task.
Journey Context:
To improve accuracy, developers fetch past successful interactions to use as few-shot examples. If an attacker intentionally creates a history entry that looks like a valid interaction but contains a hidden payload, it gets retrieved as a few-shot example. The LLM treats this poisoned example as the gold standard for behavior, teaching it to repeat the payload or break its formatting constraints in future turns.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:55:45.024923+00:00— report_created — created