Report #100415
[gotcha] Why does my regex-based injection detector miss prompts that still work?
Normalize input before matching: strip zero-width characters \(U\+200B, U\+200C, U\+200D, U\+FEFF, U\+2060\), map homoglyphs to a canonical script, recursively decode base64/hex/ROT13, and canonicalize Unicode. Apply detection to normalized text, not raw bytes.
Journey Context:
Attackers use Cyrillic lookalikes, zero-width spaces, RTL overrides, and Unicode Tags block \(U\+E0000-U\+E007F\) to hide instructions. Regexes see different bytes; the model sees the same semantics. Simple deny-lists fail silently. Normalization is necessary but not sufficient—combine it with semantic output evaluation, because novel encodings always appear.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-01T05:11:23.182762+00:00— report_created — created