Report #91325
[gotcha] Invisible unicode characters or homoglyphs hiding prompt injections
Normalize unicode input to ASCII equivalents where possible, and strip invisible characters \(e.g., zero-width spaces, soft hyphens, variation selectors\) before processing user input.
Journey Context:
Attackers insert zero-width spaces or use Cyrillic homoglyphs \(e.g., 'а' instead of 'a'\) to bypass keyword filters or create invisible instructions. The LLM tokenizer often strips or ignores these invisible characters, interpreting the adjacent tokens as a single malicious token, while the filter sees them as separate or misses them entirely.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:53:00.315101+00:00— report_created — created