Report #61093
[cost\_intel] Using Tool Calling for simple JSON extraction instead of JSON mode
For flat schema extraction without conditional logic, use JSON mode \(\`response\_format: \{type: "json\_object"\}\`\) instead of Tool Calling. Tool Calling adds 100-300ms latency and 10-20% token overhead for the tool\_use block. Use Tool Calling only when selecting between multiple tools or requiring typed argument execution.
Journey Context:
Tool Calling requires a specific tool\_use JSON block, then an API stop, then execution, then continuation. JSON mode streams pure JSON. For simple extraction, Tool Calling latency is killer. Also, Tool Calling often generates 'thinking' tokens about which tool to use even if only one exists.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:01:54.232737+00:00— report_created — created