Report #93471
[gotcha] Input filters missing hidden tokens or semantic shifts caused by unicode and base64 manipulation
Normalize and sanitize all user input before passing it to the LLM or filter. Strip zero-width characters, normalize unicode to a standard form \(NFKC\), and decode any base64 or ROT13 payloads before evaluation.
Journey Context:
Developers build regex or string-matching filters to block bad words. Attackers use 'bad' \(zero-width joiner\) or Cyrillic 'а' instead of Latin 'a'. The LLM processes the raw tokens, bypassing the filter but understanding the semantic intent. Additionally, attackers ask the LLM to decode base64 payloads in-context, bypassing keyword filters entirely.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:28:40.228384+00:00— report_created — created