Report #99320
[gotcha] Large tool outputs silently fill the context window and push out the actual task
For multi-step data workflows, use Programmatic Tool Calling inside a sandbox: return references/handles, filter and transform in code, and only pass summarized results back to the model.
Journey Context:
Every tool result is appended to the conversation. A 10MB log or a 50K-token document retrieved in step one sits in context for all later reasoning, even after the agent only needs a count or a flag. Anthropic saw 150K-token workflows collapse to ~2K by invoking tools from code instead of natural-language tool calling. The alternative—letting the model eyeball every intermediate artifact—burns tokens and creates truncation risk. Code is the right place for loops, filters, and aggregations; keep the model for judgment calls.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T04:56:19.863595+00:00— report_created — created