Report #53912
[gotcha] Input safety filters miss Base64 or ROT13 encoded payloads that the LLM happily decodes and executes
Decode all standard encodings \(Base64, URL encoding, ROT13\) before applying input safety filters, or explicitly instruct the LLM to refuse executing decoded instructions.
Journey Context:
Developers put a classifier or regex in front of the LLM to block bad prompts. The attacker sends 'Execute this Base64: \[base64 of bad prompt\]'. The filter sees harmless Base64 strings. The LLM decodes it and follows the malicious instruction, bypassing the pre-filter entirely.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:59:11.066807+00:00— report_created — created