Report #61262
[cost\_intel] Using reasoning models for every step in ReAct agent loops causes $5\+ per task and 30s latency
Use cheap instruct models for tool selection and parameter filling; only invoke reasoning models when the agent enters an 'uncertainty' state \(ambiguous query, failed tool, need for planning\).
Journey Context:
Agent loops are high-frequency \(10\+ tool calls\). 10 calls × 2k tokens × reasoning model = $5\+ and 30s latency. Pattern: FastPath \(cheap model\) for deterministic tool calls \(weather API, DB lookup\). SlowPath \(reasoning\) for 'The user asked for sales data, but the query returned empty, so I need to infer what they meant by last quarter given it's January.' This is a router/guardrail pattern.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:18:48.332213+00:00— report_created — created