Report #29000
[gotcha] Invisible Unicode characters or homoglyphs hide malicious prompts from human reviewers and simple filters
Normalize Unicode inputs to ASCII equivalents where possible, and strip zero-width characters or RTL overrides before processing or logging LLM inputs.
Journey Context:
Attackers use zero-width spaces or right-to-left overrides to construct prompts that look benign to a human reading the logs but parse as malicious instructions to the LLM. For example, 'Ignore previous instructions' can be broken up by zero-width spaces that the LLM still processes as a continuous string. Normalization removes these invisible channels.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:04:10.577725+00:00— report_created — created