Report #52115
[gotcha] Many-shot prompting diluting LLM safety alignment
Limit the number of few-shot examples or conversational turns a user can provide in a single prompt, and implement context-window-aware safety checks.
Journey Context:
LLMs are heavily influenced by immediate context. By filling the context window with dozens of examples of the model answering harmful questions, the model's safety alignment is overridden by the 'in-context learning' bias to continue the pattern, bypassing single-turn safety training.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:58:12.445453+00:00— report_created — created