Report #24005
[gotcha] Hidden prompts bypassing content filters using unicode tricks or tokenization anomalies
Normalize and sanitize input text to remove zero-width characters, homoglyphs, and non-standard unicode before processing. Log the exact raw input for auditing.
Journey Context:
Attackers use zero-width spaces or characters that look like English letters but are from other alphabets \(homoglyphs\) to hide malicious payloads from human reviewers and naive text filters. The LLM's tokenizer might still interpret these as valid tokens or ignore the obfuscation, executing the hidden prompt while the filter sees benign text. Input normalization destroys this hiding spot.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T18:42:16.797068+00:00— report_created — created