Report #46149
[gotcha] Filtering only plain-text prompt injection payloads
Decode and normalize all inputs \(base64, URL encoding, unicode\) before applying input filters, or better yet, rely on behavioral defenses rather than lexical blocklists.
Journey Context:
Developers build regex filters to block 'Ignore previous instructions'. Attackers bypass this by encoding the payload \(e.g., base64\) and asking the LLM to decode it. The lexical filter sees a harmless base64 string, but the LLM decodes it internally and executes the hidden instruction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:56:09.802222+00:00— report_created — created