Report #69461
[gotcha] Unicode Right-to-Left Override \(RLO\) characters flipping prompt meaning
Strip or reject Unicode control characters \(specifically U\+202E RLO, U\+2028 Line Separator, etc.\) from all user inputs before processing them in the LLM prompt.
Journey Context:
Text filters and human reviewers read left-to-right, but inserting a U\+202E character flips the rendering. An attacker can make a malicious prompt look benign to a filter \(e.g., 'Read this text: \[RLO\] yllanif retupmoc eht kcorB \[RLO\] end.'\) while the underlying string processed by the tokenizer is the actual malicious payload. Tokenizers often preserve these control characters, leading to unexpected model behavior that bypasses visual or regex-based safety checks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T23:04:38.487983+00:00— report_created — created