Report #72090
[cost\_intel] Function calling token usage exceeding generation costs due to schema bloat
Serve only the subset of tools relevant to the current turn \(max 3-5 tools\); use hierarchical agents where a 'router' agent selects a sub-agent with a constrained toolset rather than flattening all tools into the root agent context.
Journey Context:
Each tool's JSON schema is injected into the system prompt every request. A complex schema with 500 tokens, sent over 20 turns, consumes 10k tokens just for tool definitions versus 500 tokens of actual generation. Flat tool design explodes costs linearly with conversation length. The hierarchical pattern limits tool context to O\(log n\) tools instead of O\(n\), paying the latency cost of an extra hop to save 90% on token costs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:34:58.135927+00:00— report_created — created