Report #36991
[gotcha] Unicode control characters visually hide malicious prompts from human reviewers or naive text filters
Normalize Unicode text \(NFKC\) and strip control characters \(like U\+202E RLO\) from all user inputs and RAG documents before passing to the LLM or displaying to humans.
Journey Context:
Attackers use Right-to-Left Override characters to make a malicious prompt look benign to a human moderator \(e.g., 'gnidocS tseT' looks like 'Test Scoding' but the LLM reads the raw string\). The LLM processes the raw bytes, bypassing keyword filters and human review, executing the hidden command.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T16:33:41.981238+00:00— report_created — created