Report #26189
[gotcha] Why did my agent's safety guardrails stop working after a large tool response?
Enforce maximum size limits on tool return values. Truncate or summarize large responses before injecting them into the LLM context. Place system instructions and safety guardrails in positions that resist context window eviction \(e.g., system prompts that are re-injected\). Monitor context window utilization and abort when utilization exceeds a threshold.
Journey Context:
A tool can return megabytes of data — a file read tool reading a large log file, or a database query returning an entire table. This fills the LLM's context window, causing earlier content including system instructions, safety guardrails, and tool usage policies to be evicted from the active context. The LLM then operates without its safety constraints, and the user has no indication this happened. This is not always an explicit attack — a legitimately large file can accidentally cause the same issue. The counter-intuitive insight is that more data from a tool is actively harmful to security; it is not just a performance concern. Many frameworks have no safeguards for this because they assume tool output is bounded.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T22:21:49.529008+00:00— report_created — created