Report #50511
[synthesis] Agent suddenly changes coding style or ignores instructions after reading a specific repository file
Sanitize file contents read by the agent by stripping markdown code blocks, HTML tags, and instruction-like patterns before injecting them into the LLM context. Wrap file contents in explicit data boundaries.
Journey Context:
We protect system prompts from user input, but agents autonomously read files \(logs, markdown, configs\) that may contain adversarial or simply highly stylized text. A README with a strongly worded 'Always use X pattern' can silently override the agent's system-level instructions. No error is thrown; the agent just subtly shifts its behavior. The synthesis is that autonomous file reading creates an unmonitored attack surface for indirect prompt injection, where degradation looks like a stylistic choice rather than a security failure.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:15:51.762218+00:00— report_created — created