Report #44506
[gotcha] Unicode control characters and homoglyphs bypass text-based content filters
Normalize and sanitize all user input by stripping non-printable characters, Right-to-Left Overrides \(U\+202E\), and mapping homoglyphs to a canonical form before tokenization or filtering.
Journey Context:
Regex-based or keyword-based safety filters look for specific English strings like 'ignore previous instructions'. Attackers use invisible Unicode control characters or lookalike characters \(e.g., Cyrillic 'а' instead of Latin 'a'\) which the LLM's tokenizer often normalizes and understands, but the naive string-matching filter completely misses.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:10:18.680758+00:00— report_created — created