Report #24027
[gotcha] User-controlled few-shot examples override the model's safety training and task definition
Never dynamically construct few-shot prompts using untrusted user input. If dynamic examples are necessary, strictly validate them against an allowlist or use an intermediate LLM to sanitize the examples before templating.
Journey Context:
Developers sometimes use user search queries or past user interactions as few-shot examples to guide the LLM's format. An attacker crafts a 'query' that looks like a few-shot example \(e.g., User: \[malicious instruction\] Assistant: \[malicious compliance\]\). Because few-shot examples heavily influence LLM behavior, the model will follow the poisoned pattern, bypassing the system prompt. The model weights the immediate context \(few-shot examples\) higher than the base system instructions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T18:44:22.252285+00:00— report_created — created