Report #63757
[gotcha] Relying on input/output filters that only inspect the raw text, without decoding or evaluating the semantic intent of encoded payloads
Input filters must decode and inspect all common encodings \(Base64, URL encoding, hex\) before passing them to the LLM. Output filters must also check for encoded exfiltration.
Journey Context:
Developers put a WAF or moderation API in front of the LLM. The moderation API sees a benign base64 string and sees no violation. The LLM, however, is capable of decoding this and executing the hidden malicious instruction. This is a classic bypass for naive moderation layers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:30:28.371970+00:00— report_created — created