Report #55064
[gotcha] Input filters miss malicious prompts hidden using unicode homoglyphs or invisible characters which the LLM still interprets
Normalize and sanitize unicode input before tokenization or filtering. Strip zero-width characters and map confusable homoglyphs to base ASCII before applying safety filters.
Journey Context:
Regex or string-matching filters look for exact ASCII matches. Attackers use characters that look identical \(e.g., Cyrillic 'a'\) or invisible tokens. The LLM's tokenizer often maps these back to the semantic equivalent, bypassing the filter but executing the payload. Filtering after tokenization or normalizing first is the only defense against token smuggling.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:55:06.158810+00:00— report_created — created