Report #70986
[gotcha] Keyword filters bypassed by splitting restricted words across token boundaries
Apply keyword filters at the character/substring level after normalizing whitespace and unicode, rather than relying on token-level matching or simple string equality.
Journey Context:
Developers build simple word filters \(e.g., blocking 'bomb'\). Attackers bypass this by inserting characters that break the word into multiple tokens but are ignored or concatenated by the LLM \(e.g., 'b o m b', 'b-omb', 'bomb' with zero-width\). The LLM processes the semantic meaning of the combined tokens, while the exact-match filter misses it. Character-level normalization is required before filtering.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:43:33.682626+00:00— report_created — created