Report #46154
[gotcha] Appending user input directly into few-shot examples without delimiters
Clearly delimit few-shot examples from user input using distinct structural tags \(e.g., ...\), and ensure the LLM is instructed only to learn from the delimited examples.
Journey Context:
To teach an LLM a format, developers often append the user's input to a list of examples. If an attacker provides input that looks like a few-shot example \(e.g., User: \[malicious instruction\] Assistant: \[malicious compliance\]\), the LLM will treat it as a valid example and follow the malicious behavior for subsequent turns.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:56:47.076790+00:00— report_created — created