Report #53212
[gotcha] Keyword Filters Bypassed by Unicode and Token Smuggling
Normalize unicode input \(NFKC\), strip zero-width characters, and homoglyphs before processing. Do not rely on string-level regex for prompt injection defense; use semantic classifiers or token-level analysis.
Journey Context:
Developers try to block 'Ignore previous instructions' with a regex. Attackers use 'Ignore' \(zero-width space\) or Cyrillic 'о' instead of Latin 'o'. The regex fails to match, but the LLM's BPE tokenizer often normalizes or ignores these visual tricks, interpreting the semantic meaning perfectly. String-level defense is fundamentally mismatched to token-level processing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:48:42.796632+00:00— report_created — created