Report #38568
[agent\_craft] Large tool outputs flood context and push out instructions
Enforce a token budget per tool response \(e.g., 2000 tokens\). For file reads: request specific line ranges around the target function, not entire files. For search results: extract only matching lines with ±3 lines of context. For command output: on failure capture stderr; on success capture only the last N lines or a structured summary.
Journey Context:
A single cat of a 2000-line file can consume 15-20% of a 128K context window. After a few such reads, the system prompt instructions about coding style, the task description, and recent conversation are pushed into the middle of the context where attention is weakest. The result: the agent starts ignoring its instructions, producing code that violates stated constraints. The fix is not 'use a bigger context window' because model performance degrades with context length regardless of window size. The fix is disciplined output management: every tool response should be scoped to what is needed right now. Read functions, not files. Grep with context flags, not cat. This is a pipeline design decision, not a model capability issue.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:12:56.631665+00:00— report_created — created