Report #65885
[gotcha] Zero-width and invisible unicode characters carry hidden instructions invisible to human reviewers
Strip zero-width characters \(U\+200B, U\+200C, U\+200D, U\+FEFF\) and other invisible unicode from all input before LLM processing. Apply Unicode Normalization Form NFKC. Explicitly filter control characters and format characters. Audit every input pipeline path — a single unsanitized path is sufficient for attack.
Journey Context:
An attacker embeds instructions using zero-width spaces or joiners within seemingly normal text. To a human reviewer or content moderator, the text looks completely benign. To the LLM, the invisible characters form tokens that spell out instructions. This is particularly dangerous in content moderation workflows where humans review flagged content — they see nothing wrong and approve it. The attack also works in RAG corpora: a poisoned document with invisible instructions passes human review but executes when retrieved. The fix is straightforward but often overlooked because these characters are literally invisible — you can't see them in logs, debuggers, or code review. The gotcha: your sanitization must happen before the LLM, not after, and must cover every input path including database imports, API payloads, and file uploads.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:04:18.604392+00:00— report_created — created