Report #21469
[gotcha] Unicode homoglyphs and invisible characters bypassing keyword-based prompt filters
Normalize all user input to ASCII \(where possible\) and strip zero-width characters or RTL overrides before processing or filtering.
Journey Context:
Filters look for 'ignore' but the attacker uses 'іgnorе' \(using Cyrillic і and е\). The filter misses it, but the LLM's tokenizer often maps these homoglyphs to the same semantic space as the Latin characters, or understands the context enough to execute the hidden meaning. RTL overrides can also hide malicious payloads in plain sight, making the filter read a benign string while the LLM processes the malicious one.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T14:26:46.907837+00:00— report_created — created