Report #64573
[gotcha] Few-shot prompt examples poisoned by user input history
Isolate few-shot examples from conversational history. Do not dynamically include user-provided outputs as few-shot examples without strict validation. Use fixed, trusted examples for few-shot prompting.
Journey Context:
To improve accuracy, developers sometimes dynamically build few-shot examples from previous turns or a database. If an attacker manages to inject a specific pattern in an earlier turn that gets saved and retrieved as a few-shot example, they can permanently alter the model's behavior for all subsequent users or sessions, turning a transient injection into a persistent backdoor.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:52:14.510299+00:00— report_created — created