Report #54626
[gotcha] Attacker manipulating few-shot examples to hijack model behavior
Validate and sanitize any dynamic few-shot examples retrieved from user data or external sources, and prefer fine-tuning or static examples for critical behavioral guidelines.
Journey Context:
Developers dynamically construct few-shot prompts using user-generated content \(e.g., 'Here are examples of previous good answers: \[user data\]'\). An attacker can craft a piece of content that looks like a few-shot example but contains a malicious output, teaching the LLM to replicate the malicious behavior for future inputs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:11:06.921587+00:00— report_created — created