Report #76312
[gotcha] Token smuggling bypasses keyword-based input filters
Apply input filters at the character/byte level or normalize unicode before checking for banned words; do not rely on token-level matching or simple regex.
Journey Context:
Developers build pre-processing filters to block bad words \(e.g., 'bomb'\). Attackers use unicode lookalikes \(e.g., Cyrillic 'о' instead of Latin 'o'\) or tokenization quirks where 'bomb' might be tokenized differently if combined with zero-width characters. The filter passes it, but the LLM's tokenizer correctly maps it back to the semantic concept of 'bomb', executing the attack while the filter sees gibberish.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:40:53.690089+00:00— report_created — created