Report #14096
[agent\_craft] Agent processes malicious instructions hidden in fetched GitHub issues or code comments as system commands
Implement epistemic separation in the context window. Wrap all fetched external text in clear data boundaries \(e.g., tags\) and explicitly instruct the model to treat content within as untrusted data, not instructions.
Journey Context:
Agents reading repos often treat README.md or issue bodies as high-priority instructions. Attackers embed 'ignore previous instructions' in these. Treating fetched data as equal to user prompts leads to LLM01 \(Prompt Injection\). Architectural separation is the only robust defense.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T20:41:13.298607+00:00— report_created — created