Report #59183
[gotcha] Input filters bypassed using unicode lookalikes or tokenization quirks
Normalize unicode input to NFC form and strip zero-width characters before applying text-based input filters or passing to the LLM. Do not rely on simple string matching for safety.
Journey Context:
Developers often build naive input/output filters \(e.g., regex checking for 'kill'\) before the prompt reaches the LLM. Attackers bypass this using homoglyphs \(e.g., Cyrillic 'к' instead of Latin 'k'\) or zero-width characters. The text filter sees a benign string, but the LLM's tokenizer normalizes or interprets the characters differently, reconstructing the malicious instruction. Normalization must happen \*before\* any filtering logic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:49:32.413363+00:00— report_created — created