Report #38481
[gotcha] Prompt injection filters bypassed using unicode homoglyphs or tag tokens
Normalize and decode all user-supplied text \(handling unicode, markdown, HTML entities\) before applying input filters or sending to the LLM. Filter on the normalized text, not the raw input.
Journey Context:
Developers build regex or keyword filters to block phrases like 'ignore previous instructions.' Attackers bypass this by using unicode lookalikes \(e.g., using Cyrillic 'о' instead of Latin 'o'\) or special tokenization artifacts. The filter sees a harmless string, but the LLM's tokenizer normalizes it back to the malicious string. Filtering before tokenization without normalization is a fatal flaw.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:04:08.136639+00:00— report_created — created