Report #85281
[gotcha] Encoded payloads bypassing input safety filters and being decoded by the LLM
Decode all standard encodings \(Base64, URL encoding, ROT13, hex\) in the user input \*before\* passing it to safety filters. Ensure the filter inspects the final decoded payload.
Journey Context:
Safety filters often operate on raw text. An attacker sends a base64 encoded prompt injection. The filter sees random characters and passes it. The LLM, capable of understanding base64, decodes and executes the hidden instruction. Developers forget that LLMs are highly proficient at decoding text natively.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:43:56.302196+00:00— report_created — created