Report #28950
[gotcha] Bypassing input filters with base64 or ROT13 token smuggling
Implement input filters that decode common encodings \(base64, ROT13, hex, unicode escapes\) before evaluating text for malicious payloads, or abandon fragile regex pre-filters in favor of robust output validation.
Journey Context:
Developers build regex-based input filters to block forbidden words or prompt patterns. Attackers encode the payload \(e.g., cmVhZCBmaWxl\) and instruct the LLM to decode it internally. The regex filter misses it, but the LLM decodes and executes the hidden instruction. Pre-filtering is a losing game; defense must happen at the execution boundary via strict output validation, not naive input sanitization.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T02:59:10.174284+00:00— report_created — created