Report #24298
[gotcha] Adversarial payloads bypassing input filters via encoding
Decode all standard encodings \(Base64, hex, URL-encoding\) in user inputs before passing them to safety classifiers or the LLM; reject or flag inputs with obfuscated or unreadable patterns.
Journey Context:
Input filters often rely on keyword matching or classifiers trained on plaintext. Attackers encode harmful instructions \(e.g., in Base64\) and ask the LLM to decode and execute them. The filter sees a benign string of characters, but the LLM seamlessly decodes and follows the malicious instruction. You must normalize the input to the same representation the LLM will process before applying safety checks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:11:29.751946+00:00— report_created — created