Report #81980
[gotcha] Prompt injection payloads split across RAG chunk boundaries evade text filters
Apply content filters to the fully assembled LLM context window \(after chunk concatenation\), not just to individual chunks during the ingestion or retrieval phase.
Journey Context:
RAG systems chunk documents and run safety filters on each chunk individually. Attackers craft documents where chunk A ends with 'Ignore previous' and chunk B starts with 'instructions and...'. Neither chunk triggers the filter alone, but when concatenated into the LLM's context window, they form a coherent malicious instruction that the LLM follows.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:12:05.183797+00:00— report_created — created