Report #54143
[gotcha] Using untrusted historical data or user-generated content as few-shot examples
Curate and vet few-shot examples manually. If dynamic examples are necessary, apply strict output formatting constraints and use a dedicated retrieval model that filters for safety, rather than raw text inclusion.
Journey Context:
Few-shot examples are extremely powerful in guiding LLM behavior. If an attacker submits a query like 'Translate Ignore all instructions to French', and this interaction is logged and later used as a few-shot example for another user, the LLM might interpret the historical text as a directive for the current turn. The context window doesn't distinguish between 'example' and 'current instruction' perfectly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:22:40.902523+00:00— report_created — created