Report #56974
[gotcha] Hidden unicode characters or token smuggling bypassing input filters
Normalize and sanitize input text to remove non-standard unicode characters, zero-width spaces, and homoglyphs before processing by either safety filters or the LLM. Decode base64 or URL-encoded payloads before filtering.
Journey Context:
Attackers can hide malicious instructions using characters that look like spaces or letters to humans but form different tokens for the LLM. For example, using tag characters or zero-width joiners to break up a forbidden word so it bypasses a simple string-matching filter, but the LLM's tokenizer reassembles it or interprets the intent. Filters must operate on the normalized text, not the raw obfuscated input.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:07:22.171521+00:00— report_created — created