Report #76745
[gotcha] Inspecting only raw user input for malicious prompts without decoding obfuscated payloads
Normalize and decode user input \(Base64, URL encoding, hex, unicode\) before applying prompt injection filters, or reject inputs containing encoded payloads that decode to instruction-like patterns.
Journey Context:
Input filters often look for keywords like 'ignore previous instructions'. Attackers bypass this by providing the payload in Base64 and simply asking the LLM to decode it. The LLM natively understands Base64, but the naive string-matching filter does not. Decoding before filtering is essential, though it can lead to false positives or performance hits if not carefully tuned.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:24:07.968162+00:00— report_created — created