Report #94821
[gotcha] Filtering prompts using simple string matching or regex without normalizing unicode
Normalize unicode \(NFKC\) and strip zero-width characters / RTL overrides before processing or logging user inputs.
Journey Context:
Attackers hide 'Ignore previous instructions' using lookalike characters \(e.g., Cyrillic 'а' instead of Latin 'a'\) or zero-width spaces. Regex filters fail because the string looks different to the filter, but the LLM tokenizer normalizes or interprets the characters identically, executing the hidden payload.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:44:23.664701+00:00— report_created — created