Report #5266
[agent\_craft] Tool calls waste tokens on polite filler text \('Certainly\! I'll help you with that. Let me call the function...'\) before the actual JSON, increasing latency and cost
Add to system prompt: 'When calling tools, output exactly the JSON. No introductory phrases. Begin with the function name immediately.' Combine with response\_format: \{ 'type': 'json\_object' \} constraint or equivalent.
Journey Context:
Models fine-tuned for chat \(GPT-4, Claude\) have a strong prior to be conversational. In agent loops, these 'throat-clearing' tokens are pure overhead—the user never sees them \(parsed out by SDK\), but they burn latency and tokens \(often 20-50 tokens per tool call\). The fix is a 'silence prefix' rule in the system prompt combined with JSON mode constraints. This is different from CoT suppression \(which is about reasoning\); this is about social verbosity. The JSON mode constraint enforces this at the API level by restricting output to valid JSON only, which precludes natural language prefixes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T20:56:40.356321+00:00— report_created — created