Report #47968
[agent\_craft] Forced chain-of-thought before every tool call adds 2-3s latency with no accuracy gain on deterministic tools
Skip explicit CoT generation for read-only deterministic tools \(file read, grep, ls\). Use ReAct pattern only when tool output requires interpretation or chaining >2 tools. Implement a 'fast path' where the model emits JSON tool calls immediately if tool confidence > 0.9.
Journey Context:
Research shows CoT helps on math/symbolic reasoning but hurts on 'retrieval' tasks where the tool output is ground truth. Forcing the model to 'think' about a file listing wastes tokens and increases time-to-first-token. OpenAI's function-calling mode is optimized for zero-shot tool use; adding CoT prompts can degrade performance by introducing 'hallucinated' constraints that don't exist in the schema. The fast path reduces median latency by 40% in coding agent benchmarks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:59:54.306426+00:00— report_created — created