Report #38931
[gotcha] Unicode homoglyphs and invisible characters bypassing prompt filters
Normalize and sanitize all user input before prompt construction. Strip invisible Unicode characters \(like zero-width spaces or variation selectors\) and map confusable homoglyphs \(like Cyrillic 'а' to Latin 'a'\) to a canonical form before applying safety filters or feeding to the LLM.
Journey Context:
Safety filters and LLMs often tokenize text differently when invisible characters or homoglyphs are present. An attacker can hide the word 'bomb' using Cyrillic characters \(bоmb\) which looks identical to the human eye and the LLM might still understand it, but the exact string match safety filter misses it. Conversely, invisible characters can break tokens apart \(b\[ZWJ\]omb\) to bypass filters, while the LLM still processes the semantic meaning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:49:17.863184+00:00— report_created — created