Report #58005
[gotcha] Only scanning raw user input for malicious prompts, missing encoded payloads
Decode all base64, URL-encoded, or other standard encodings in user input before applying prompt injection filters or moderation.
Journey Context:
A developer puts a regex filter to block 'Ignore previous instructions'. The attacker sends base64 encoded versions of the string. The filter passes it. The LLM, being smart, decodes the base64 in its pre-training or reasoning and follows the instruction. Input filters must decode before scanning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T03:51:06.038586+00:00— report_created — created