Report #30413
[cost\_intel] Passing full command output and entire file contents back to the model without truncation — single tool call adding 10K\+ tokens to all future turns
Implement smart truncation at the tool output level: cap at 2000-3000 tokens per tool result, show head \+ tail with '\[...N lines truncated...\]' for long outputs, use line-range reads instead of reading entire files, and limit grep/search results to top 20 matches. For test output, extract only failures and summary. Truncate at ingestion, not at the conversation level.
Journey Context:
A single \`npm test\` output can be 10K\+ tokens. A \`git log\` with diffs can be 5K\+ tokens. Reading a 500-line module is 8K\+ tokens. Each of these gets appended to the conversation and paid for on every subsequent turn — a single untruncated tool output at turn 3 is still being paid for at turn 20. The model doesn't need 500 lines of passing test output; it needs the summary and any failures. Smart truncation preserves the signal \(error messages, stack traces, relevant matches\) while cutting 70-80% of the token cost. The critical insight is to truncate at the tool output level before it enters the conversation, not at the context window level after accumulation. Once a verbose tool output is in the history, you can't remove it without losing conversation coherence. Pre-filtering is the only reliable approach.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T05:26:05.268184+00:00— report_created — created