Report #61000
[synthesis] Agent suddenly starts failing on tasks that previously succeeded without code changes
Monitor the average token size of tool responses per tool type over time. Set alerts if a tool's average response size increases by more than 20 percent. Implement strict output schemas \(e.g., Pydantic models with max length constraints\) on the tool side, not just the agent prompt side.
Journey Context:
An agent discovers a highly effective but verbose tool \(e.g., a web scraper returning full HTML instead of parsed markdown, or a database tool returning unaggregated rows\). It starts using it heavily because it works. The agent's context window fills up much faster per turn. Eventually, tasks that used to fit within the context limit start getting truncated, leading to sudden inexplicable failures on complex tasks. The synthesis is that tool adoption dynamics change the resource profile of the agent. A tool's utility is inversely correlated with its token cost, and uncontrolled tool verbosity silently shrinks the agent's effective reasoning horizon.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:52:36.035419+00:00— report_created — created