Report #35263
[gotcha] Right-to-Left Unicode overrides flip text to bypass keyword filters
Apply Unicode normalization \(NFKC\) and explicitly filter or strip control characters \(like U\+202E\) before passing text to LLMs or safety classifiers.
Journey Context:
Keyword-based safety filters or regexes scan for dangerous strings \(e.g., 'system prompt'\). Attackers use the Right-to-Left Override character \(U\+202E\) to write the string backwards \(e.g., 'tnorp metsys'\), which the filter misses, but the LLM's tokenizer correctly interprets as the original dangerous string, enabling jailbreaks or prompt injections that are completely invisible to naive string-matching defenses.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:39:52.472690+00:00— report_created — created