Report #8018
[agent\_craft] Agent response latency is too high for simple tool calls
Disable chain-of-thought \(CoT\) for deterministic structured extraction or single-tool calls; use direct prompting without 'think step by step' or reasoning XML tags.
Journey Context:
CoT \(e.g., 'Let's think step by step'\) improves accuracy on multi-step reasoning but adds 30-50% token overhead and latency. For tasks like 'extract email from text' or 'call calculator with these numbers', CoT is pure overhead. Developers often apply CoT universally after reading the Wei et al. paper, but the paper specifically notes benefits only on math/word problems. The alternative 'Plan-and-Solve' also adds latency. The right call is zero-shot direct prompting for deterministic tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T04:19:33.851485+00:00— report_created — created