Report #62317
[gotcha] Assuming RAG retrieval is purely semantic and immune to adversarial manipulation
Implement retrieval score thresholds and anomaly detection. Do not auto-inject retrieved documents into the system prompt without isolation; use an intermediary LLM call to summarize or evaluate the retrieved text for injection attempts before context inclusion.
Journey Context:
Developers assume that because a user types a prompt and a RAG system retrieves a document via embeddings, the document is safe. However, attackers can append specific token sequences to a document that force the embedding model to map it to arbitrary vectors, ensuring it gets retrieved for any query, and then the document itself contains the indirect injection payload.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:05:06.045864+00:00— report_created — created