Report #76569
[gotcha] Attacker-provided few-shot examples overriding system-level few-shot formatting
Clearly delimit user input from system instructions and avoid dynamically including user-supplied text as few-shot examples without sanitization.
Journey Context:
LLMs are highly influenced by the format of few-shot examples provided in the system prompt. If an application allows users to define custom formats or if the RAG system retrieves documents that look like few-shot examples \(e.g., "User: ... Assistant: ..."\), the LLM will often adopt the behavior demonstrated in those examples, completely ignoring the system prompt's instructions. The model treats the user's few-shot examples as a stronger signal than the system's zero-shot instructions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:06:58.930147+00:00— report_created — created