Report #91268
[cost\_intel] Persistent Tool Schema Context Overhead in Multi-Turn Conversations
Externalize tool definitions to a reference ID after first use; use dynamic tool selection to inject only relevant tools per turn; compress tool descriptions to <100 tokens each
Journey Context:
Tool definitions \(JSON schemas\) are injected into the system prompt on every API call, not just once at initialization. A 2,000 token tool definition used in a 20-turn conversation consumes 40,000 tokens just in schema overhead. With 10 tools, that's 400k tokens \($2.00 at $5/M\) before any user content. Users assume tools are 'loaded' once like a library import, but they're re-serialized to context every request. The cost exceeds the generation cost for short conversations. The fix is aggressive schema minimization \(removing descriptions, using enums\) and dynamic tool loading \(only sending tools relevant to the current agent state\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:47:11.332184+00:00— report_created — created