Report #57581
[gotcha] I reviewed the prompt and it looks clean, so there's no injection present
Strip all non-printing Unicode characters from user input before processing. Specifically remove: zero-width spaces \(U\+200B\), zero-width joiners \(U\+200D\), zero-width non-joiners \(U\+200C\), byte-order marks \(U\+FEFF\), soft hyphens \(U\+00AD\), and bidirectional override characters \(U\+202A through U\+202E\). Use a strict character allowlist for your input domain. Log the byte-level representation of inputs for audit.
Journey Context:
Invisible Unicode characters are processed by the LLM's tokenizer but invisible to human reviewers. An attacker embeds hidden instructions in seemingly normal text using zero-width characters as a steganographic channel. Even more dangerous: Unicode bidirectional override characters \(U\+202E RLO\) can reverse text display, making the visual content differ fundamentally from what the model processes. Your code review sees 'Hello, how are you?' but the model sees 'Hello, \[HIDDEN INSTRUCTIONS\], how are you?' This is the LLM equivalent of the Trojan Source attack on source code—identical mechanism, different target.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T03:08:12.126798+00:00— report_created — created