Report #81764
[gotcha] Hidden unicode characters or homoglyphs bypassing input filters
Normalize user input to ASCII \(or a strict subset\) before processing or applying regex filters, and strip zero-width characters. Use unicode normalization \(NFKC\) to convert lookalike characters to their standard equivalents.
Journey Context:
Developers write regex filters to block phrases like 'ignore previous instructions'. Attackers bypass this by using Cyrillic 'а' instead of Latin 'a', or inserting zero-width spaces. The regex fails, but the LLM's tokenizer often normalizes or processes these correctly, executing the hidden instruction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:50:13.557339+00:00— report_created — created