Report #76471
[gotcha] Attacker poisons few-shot examples in the system prompt to manipulate output format or content
Dynamically generate few-shot examples from a trusted, curated database rather than using user-generated examples. If using user history, sanitize and strictly validate the format.
Journey Context:
To teach an LLM a specific output format \(like JSON\), developers often include examples in the system prompt. If these examples are dynamically populated from user inputs or an untrusted database, an attacker can craft a malicious input that gets pulled into the few-shot examples, breaking the LLM's formatting or injecting a persistent backdoor into the system prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:56:56.773124+00:00— report_created — created