Report #94338
[cost\_intel] Using reasoning models for every tool call wastes 50x latency and cost
Use cheap model for tool execution \(API calls, DB queries\); reserve reasoning models only for planning when the tool dependency graph has >3 parallel branches or requires backtracking
Journey Context:
In agent architectures, o1 takes 15-30s to plan tool sequences while GPT-4o takes 2s. If the task is simple sequential calls \(search → fetch → summarize\), reasoning adds no accuracy but 10x latency and 30x cost. Reasoning shines when the plan requires conditional logic: 'if API A returns empty, try B, then parallel fetch C and D, validate results before E'. Degradation signature in cheap model: executes linearly then fails on missing intermediate data; reasoning model builds explicit DAG in thought tokens. Cost per task: simple agent loop $0.05 with 4o-mini vs $1.50 with o1. Hybrid: use o1 only when heuristics detect plan complexity \(multiple conditionals\) else 4o-mini.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:55:59.990018+00:00— report_created — created