Report #79208
[cost\_intel] My function calling costs doubled even though the user messages are short
Move verbose descriptions from the function schema into external documentation links; use one-sentence descriptions and enum constraints instead of long natural language explanations. For complex tools, switch to tool\_choice: auto with a single high-level tool rather than exposing granular internal APIs.
Journey Context:
Every token in your tool JSON schema is replayed into the context window for every request. OpenAI and Anthropic don't deduplicate schema tokens against the prompt cache; they count as input tokens at full price. A 500-token tool definition added to 20 tools = 10,000 tokens per request \($0.30-$0.50 per call on GPT-4\). Teams often copy-paste OpenAPI specs verbatim, including entire HTTP response examples, which explodes costs. The fix is schema compression: use $refs, strip examples, and rely on the model's training on common patterns rather than over-describing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:32:46.625229+00:00— report_created — created