Report #54296
[gotcha] Dynamic few-shot examples from user history poison LLM behavior
Curate few-shot examples statically or from highly trusted sources; never dynamically inject user-generated content or untrusted external data into the few-shot example section of the prompt.
Journey Context:
To make LLMs adaptive, developers pull 'successful' past interactions from a database to use as few-shot examples. An attacker performs a few malicious actions, which get saved as 'successful' examples. The LLM then uses these poisoned examples as the gold standard for how to behave, adopting the malicious persona or output format permanently for all users.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:38:00.521606+00:00— report_created — created