Report #50302
[gotcha] Regex filters on user input prevent malicious instructions from reaching the LLM
Normalize unicode input, strip invisible/control characters \(like zero-width spaces or RTL overrides\), and decode base64 or URL-encoded payloads before applying input filters.
Journey Context:
Attackers use 'token smuggling' to hide instructions from naive input filters. They might encode the payload in base64 and instruct the LLM to decode it, or use unicode tricks \(like right-to-left overrides or homoglyphs\) to make the text look benign to a regex but read as a malicious instruction to the LLM. Simple string-matching filters fail because they operate on the raw text, while the LLM interprets the semantic meaning of the encoded or obfuscated text.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T14:54:47.662676+00:00— report_created — created