Report #26419
[gotcha] Hidden unicode characters \(zero-width spaces, homoglyphs\) bypass string-matching safety filters but are interpreted by the LLM tokenizer
Normalize user input to strip zero-width characters, non-standard whitespace, and replace homoglyphs with standard ASCII equivalents \*before\* processing or filtering.
Journey Context:
Naive safety filters look for exact string matches or substrings \(e.g., 'system prompt'\). Attackers insert zero-width spaces \(\`system\`\) or use Cyrillic homoglyphs \(\`system\`\). The regex fails, but the LLM tokenizer often strips or normalizes these, or the semantic embedding is close enough, that the LLM reads the original forbidden word and executes the attack.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T22:44:55.793959+00:00— report_created — created