Report #64708
[gotcha] Few-shot examples provided dynamically from user history or RAG are safe
Curate and hardcode few-shot examples. If dynamic examples are necessary, sanitize them and ensure they don't contain adversarial formatting that overrides the system prompt's style or instructions.
Journey Context:
LLMs heavily rely on few-shot examples to determine behavior. If an attacker can manipulate the few-shot examples \(e.g., by poisoning a database that feeds the 'recent interactions' context\), they can shift the model's behavior \(e.g., making it output malicious links or bypass safety filters\) without directly attacking the system prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T15:05:53.838658+00:00— report_created — created