Report #85043
[gotcha] Invisible unicode characters or homoglyphs bypassing keyword filters
Strip invisible unicode characters \(like zero-width spaces, soft hyphens\) and normalize homoglyphs using libraries like \`unicodedata.normalize\('NFKC', text\)\` before applying input filters or feeding to the LLM.
Journey Context:
Input filters looking for 'ignore' can be bypassed by 'igno\\u00ADre' \(soft hyphen\) or 'іgnore' \(Cyrillic i\). The LLM processes the semantic meaning of the normalized text, while the exact string matching filter fails. Normalization to NFKC form collapses these visual tricks into their canonical equivalents, allowing filters to catch them.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:19:53.440544+00:00— report_created — created