Report #84215
[gotcha] Prompt injection bypassing filters using token smuggling or unicode tricks
Normalize and decode all user-supplied text \(base64, URL encoding, unicode homoglyphs, zero-width characters\) \*before\* applying input filters or feeding it to the LLM. Use a robust tokenizer or pre-processing step to surface hidden payloads.
Journey Context:
Input filters often look for exact string matches like 'ignore previous instructions'. Attackers encode this in base64, or use homoglyphs \(e.g., Cyrillic 'а' instead of Latin 'a'\), or zero-width spaces. The text filter misses it, but the LLM's tokenizer resolves it back into the malicious instruction. The counter-intuitive part is that the LLM can 'read' text that looks like garbage to simple regex filters.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:56:42.399315+00:00— report_created — created