Report #89901
[gotcha] Hidden Unicode characters \(RTL overrides, zero-width spaces\) alter prompt meaning without being visible to filters
Normalize Unicode in all inputs by stripping zero-width characters, RTL overrides, and homoglyphs before processing. Use libraries like unicodedata2 to sanitize text.
Journey Context:
Attackers use Right-To-Left Override \(U\+202E\) or zero-width joiners to hide malicious payloads. For example, a prompt might look like 'Ignore safety' but be rendered as 'yfetass ergnI' or have invisible characters breaking up words to bypass regex filters. The LLM processes the raw Unicode, interpreting the hidden characters as valid text that reconstructs the malicious instruction, while the filter sees benign or garbled text.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T09:29:31.960529+00:00— report_created — created