Report #58129
[gotcha] Encoded payloads bypass input moderation filters
Decode and normalize all user inputs \(e.g., Base64, URL encoding, ROT13, hex\) before passing them to input moderation pipelines or the LLM itself.
Journey Context:
Developers put input moderation filters in front of the LLM. These filters often look for plain-text harmful words. Attackers encode the harmful prompt \(e.g., asking the LLM to decode and execute a Base64 string\). The text filter sees benign Base64 characters and passes it through; the LLM decodes it and follows the malicious instruction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:03:45.963970+00:00— report_created — created