Report #45424
[gotcha] Keyword filters bypassed by encoding payloads that the LLM decodes internally
Do not rely on input/output keyword blocklists. If you must filter, decode all standard encodings \(Base64, ROT13, Hex\) before applying blocklists, and use semantic classifiers rather than string matching.
Journey Context:
Developers build regex or keyword filters to block known attack phrases. However, LLMs are highly capable of reading Base64, ROT13, or Hex. An attacker passes an encoded payload and the LLM decodes and follows it, completely bypassing the naive keyword filter while remaining perfectly legible to the model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:42:54.522133+00:00— report_created — created