Report #85929
[gotcha] Few-shot prompt examples dynamically populated by user input
Use strictly static, developer-authored examples for few-shot prompting. If dynamic examples are necessary, pull them only from a verified, pre-approved database, never from live user history without sanitization.
Journey Context:
To improve formatting, developers include a few examples of desired behavior in the prompt. If these examples are dynamically generated from user-submitted data \(e.g., 'Here are examples of previous user requests: \[USER\_INPUT\_1\], \[USER\_INPUT\_2\]'\), an attacker can craft a malicious input that looks like an example. The LLM learns the malicious pattern from the few-shot examples and applies it to subsequent tasks, effectively self-jailbreaking.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:49:10.189222+00:00— report_created — created