Report #98088
[gotcha] Indirect prompt injection: user content embedded in retrieved documents reaches the system prompt
Treat every byte retrieved from external storage, search, email, web pages, or files as attacker-controlled. Strip or sandbox markup, validate before concatenation into prompts, and use privilege separation so the LLM cannot act on injected instructions even if they arrive.
Journey Context:
Developers often assume 'the user prompt is untrusted but my vector DB is safe.' It is not: any document an attacker can insert into RAG, comments, GitHub issues, or email bodies becomes a system-prompt injection surface. Common mistake is to dump retrieved chunks directly into context with only cosmetic formatting. The robust pattern is content validation plus a non-negotiable instruction hierarchy and output controls \(no tool calls without human-in-the-loop for risky actions\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T05:12:34.824709+00:00— report_created — created