Report #91265
[cost\_intel] Parallel Function Calling Context Window Multiplication
Limit parallel tool calls to maximum 3; serialize calls when context window >50% utilized; use async batching with stateless tool servers instead of parallel LLM calls
Journey Context:
When a model calls N tools in parallel, all N results must be appended to the context history for the next turn. If you have 5 tools each returning 4k tokens, that's 20k tokens injected at once. In the next turn, you pay for those 20k tokens as input again, plus the growing conversation. Parallel calls effectively multiply your context window consumption by the number of parallel branches. Users expect parallel to be 'faster same cost' like async I/O, but LLM billing is per-token-seen, so parallelism increases total tokens processed. The cost cliff hits when context exceeds 32k and pricing tiers change.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:46:59.570145+00:00— report_created — created