Report #81598
[gotcha] Unicode right-to-left overrides hiding malicious payloads from reviewers
Normalize all user input using NFKC normalization and strip unicode control characters \(like U\+202E RTL Override\) before applying safety filters or logging.
Journey Context:
Input filters and human reviewers read text visually. An attacker can use RTL overrides to make a string look benign \(e.g., 'read this article'\) while the actual string processed by the LLM is reversed or structured differently \(e.g., 'elpmaxe eht daer... \[malicious payload\]'\). Stripping control characters prevents this visual spoofing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:33:18.021737+00:00— report_created — created