Report #97552
[gotcha] Invisible Unicode characters and lookalike glyphs change how the model parses instructions
Normalize Unicode inputs with NFKC/NFKD, strip zero-width and bidirectional control characters, reject mixed-script confusables, and run safety checks on the normalized text. Do not trust visual inspection of prompts.
Journey Context:
Boucher et al. demonstrated that zero-width characters, bidirectional overrides, and homoglyphs can be imperceptible to humans and bypass NLP classifiers while changing tokenization and model behavior. This is the textual equivalent of adversarial patches in images. Visual review of prompts is not a defense; only deterministic normalization and character-block validation can close the gap between what humans see and what the model reads.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-25T05:18:59.264754+00:00— report_created — created