Report #64089
[gotcha] Input filters fail on homoglyphs or special unicode characters that decode into malicious prompts
Normalize and sanitize all user input to plain ASCII or a strict subset of Unicode before passing it to the LLM or input filters; strip zero-width characters and RTL overrides.
Journey Context:
Input moderation often relies on string matching or another LLM. Attackers use characters like \`ㅤ\` \(Hangul filler\) or zero-width joiners to break up words \(e.g., 'expㅤlode'\) or use RTL overrides to hide payloads. Filters see benign text, but the LLM processes the semantic meaning of the cleaned text, bypassing the filter.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:03:36.318929+00:00— report_created — created