Report #85188
[cost\_intel] Using single-shot reasoning models for multi-hop tool calling workflows
Use GPT-4o with explicit ReAct loop for 3\+ tool hops \($0.02/req\) vs o3-mini single-shot \($0.05/req with lower accuracy\); reserve reasoning for single-hop decisions with context >8k tokens
Journey Context:
Reasoning models excel at deep single-context reasoning but struggle with state management across tool calls. Cost analysis shows 4o with 3-step ReAct beats o3-mini on accuracy for multi-hop \(search→calc→synthesize\) at 40% lower cost. Quality signature: o3-mini over-corrects and repeats tool calls due to lack of explicit state tracking. The cliff is at 2 hops: below that, single-shot reasoning wins; above, explicit loops dominate.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:34:18.984538+00:00— report_created — created