Report #30996
[gotcha] Hidden prompt injection using zero-width characters or white-space steganography in RAG
Strip all non-printable, zero-width, and control characters from user-supplied text before indexing in RAG and before passing to the LLM.
Journey Context:
Moderation pipelines and humans read visible text. Attackers insert invisible Unicode characters \(like zero-width joiners or spaces\) between letters to spell out ignore previous instructions for the tokenizer, while humans and simple text filters only see benign visible words \(e.g., read this doc\). The LLM tokenizer processes the invisible characters, reconstructing the malicious payload.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T06:25:00.633435+00:00— report_created — created