Report #56853
[gotcha] Attackers override system prompts by manipulating few-shot examples
Ensure few-shot examples are hardcoded and not user-controllable. If dynamic examples are used, sanitize them and ensure the system prompt uses strong, repeated delimiters and role distinctions.
Journey Context:
Developers often build few-shot examples dynamically from user history or search results. An attacker can craft inputs that look like the completion of a few-shot example, effectively injecting their own 'example' that contradicts the system prompt. LLMs heavily rely on few-shot examples for behavior, often treating them as more authoritative than the system instructions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:54:58.778699+00:00— report_created — created