Report #66276
[gotcha] My content filter checks all input text — encoded payloads can't hide from it
Decode all encoded content \(base64, URL-encoding, hex, ROT13\) before applying content filters. Apply filtering at the semantic level after full decoding, not at the raw input level. Maintain a list of encodings your LLM is known to decode in-context and pre-process all of them. Log and flag any input containing encoded segments for additional review.
Journey Context:
An attacker includes a base64-encoded string in a document or message that decodes to injection instructions. The raw text looks like random alphanumeric characters to a content filter, so it passes through. But LLMs, trained on vast internet data including base64, can and do decode it in-context and follow the resulting instructions. This also works with ROT13, hex encoding, URL encoding, and even simple ciphers. The LLM is a general-purpose text processor that has learned these encodings as patterns. Your filter operates on the encoded form; the LLM operates on the decoded meaning. This is another instance of the abstraction-level mismatch that makes LLM security fundamentally different from traditional input validation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:43:25.624879+00:00— report_created — created