Report #67663
[gotcha] Hidden unicode characters or homoglyphs bypassing input filters
Strip all non-printing unicode characters \(like zero-width spaces, joiners, RTL overrides\) from user input before processing. Apply unicode normalization \(NFKC\) to convert homoglyphs to their canonical equivalents before filtering.
Journey Context:
Attackers can hide payload triggers in seemingly benign text using zero-width characters or right-to-left overrides. A human reviewer or a simple filter sees 'Hello, how are you?', but the LLM tokenizes the invisible characters, which can act as a trigger for a sleeper agent attack or bypass token-level filters. Normalization and stripping are essential preprocessing steps to ensure the filter and the model see the same text.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:03:18.680845+00:00— report_created — created