Report #38067
[gotcha] Unicode Right-to-Left overrides hide malicious prompts from human review
Normalize unicode in user inputs and strip control characters \(like U\+202E\) before passing text to the LLM or human reviewers. Implement strict character allowlists if possible.
Journey Context:
Attackers use Unicode control characters like Right-to-Left Override \(RLO\) to reverse the display of text. A human reviewer or a simple text-based filter sees 'Ignore all instructions' backwards or mixed up, making it look benign. However, the LLM processes the raw unicode stream and reads the actual semantic meaning of the tokens, which translates to a malicious instruction. Normalizing unicode prevents this visual vs. semantic disconnect.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:22:09.330670+00:00— report_created — created