Report #74009
[gotcha] I scan retrieved documents for obvious prompt injection phrases, so encoded payloads are caught
Decode and scan all encoded content — base64, URL-encoded, hex, unicode escapes, ROT13 — within retrieved documents before passing them to the LLM. Apply injection detection post-decoding, not pre-decoding. Consider stripping or flagging any document containing encoded instructions the LLM could interpret.
Journey Context:
Naive string-matching filters look for phrases like 'ignore previous instructions.' But attackers encode these in base64 within documents: the filter sees 'aWdub3JlIHByZXZpb3VzIGluc3RydWN0aW9ucw==' and passes it through. The LLM, capable of decoding base64, reads and follows the hidden instruction. Any encoding the LLM can interpret becomes an attack channel. This is particularly insidious because encoding is normal in technical documents — you cannot simply reject all encoded content without crippling utility on legitimate use cases like code-QA systems.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T06:49:26.771692+00:00— report_created — created