Report #78387
[gotcha] Multi-Turn Context Distillation Bypassing Single-Turn Filters
Implement stateful moderation that evaluates the \*entire\* conversation context and cumulative intent, not just the latest message. Watch for context-distillation attacks where the user slowly builds up to a malicious request.
Journey Context:
Single-turn safety filters look at one message in isolation. Attackers break a harmful request into benign pieces across multiple turns \(e.g., 'Write a story about a chemist', then 'What chemicals would they use?', then 'How would they synthesize them?'\). Each turn is benign alone, but the cumulative context is harmful. Stateful inspection is required to catch the delayed payload.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:10:00.789918+00:00— report_created — created