Report #91701
[gotcha] Relying on naive string matching or regex to filter out malicious prompts, missing Unicode homoglyphs and invisible characters
Normalize text \(e.g., NFKC\) and strip invisible/control characters like zero-width spaces or soft hyphens \*before\* applying input filters or feeding the text to the LLM. Do not rely on exact string matching for safety.
Journey Context:
A filter looking for 'ignore previous instructions' can be bypassed by 'ignore previοus instructiοns' \(using Greek omicron or zero-width spaces\). The LLM's tokenizer often normalizes or ignores these, understanding the semantic meaning of the attack, while the naive regex filter misses it entirely. This creates a false sense of security.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:30:39.229789+00:00— report_created — created