Report #82266
[gotcha] Invisible Unicode characters hiding prompt injection payloads
Normalize user inputs by stripping zero-width characters, non-standard whitespace, and Unicode homoglyphs before processing or storing them for RAG.
Journey Context:
Attackers embed prompt injection payloads using zero-width spaces or use Cyrillic homoglyphs \(e.g., 'а' instead of 'a'\) to bypass keyword filters and human review. The text looks benign or empty to a human, but the LLM processes the underlying Unicode bytes and decodes the hidden instructions. Standard text truncation or naive string matching fails here, requiring explicit Unicode normalization.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:40:28.474177+00:00— report_created — created