Report #91936
[gotcha] Few-shot demonstrations in system prompts enabling jailbreaks
Avoid using user-generated or dynamic data to construct few-shot examples in the system prompt. If dynamic examples are necessary, strictly sanitize them and use a separate, lower-privilege turn or context block.
Journey Context:
Developers often populate the system prompt with few-shot examples \(e.g., 'Here is how you should respond: User: \[dynamic\] Assistant: \[dynamic\]'\). If an attacker controls the dynamic data, they can inject a fake few-shot example that demonstrates the LLM ignoring its safety instructions. The LLM's strong in-context learning ability means it will mimic the malicious example. Static, hardcoded examples are safe; dynamic ones are an injection vector.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:54:19.183335+00:00— report_created — created