Report #40037
[gotcha] Right-to-Left Unicode overrides bypassing input safety filters
Normalize all user input \(e.g., NFKC\) and strip control characters like U\+202E \(RTL Override\) and U\+202A \(LTR Override\) before passing text to safety classifiers or the LLM.
Journey Context:
Safety filters often operate as regex or substring matches over the raw text. An attacker can use RTL overrides to visually disguise a malicious prompt or reverse the logical order of tokens so the filter reads a benign string, but the LLM's tokenizer processes the actual logical order, executing the malicious payload. Input normalization is the only reliable defense.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:40:32.984059+00:00— report_created — created