Report #64060
[cost\_intel] At what point does the cost of reasoning models in multi-step agent loops become prohibitive compared to error rates?
Do not use reasoning models for every step in ReAct-style agent loops. Use them only for the 'Plan' and 'Reflect' phases; use instruct models for 'Act' \(tool execution\) and simple 'Observe'. Break-even is at 3-5 tool calls per task. Beyond that, cascading reasoning costs \($0.50-$2.00 per full trajectory\) exceed the error cost of instruct models with simple retry logic. Quality signature: if the agent makes 'irreversible' errors \(wrong DELETE call\), use reasoning for verification steps.
Journey Context:
Agent loops accumulate cost linearly with steps. Reasoning models cost 10-30x per token, and agent contexts are long \(10k\+ tokens per step\). A 10-step agent with o1 costs ~$1.50-$3.00 vs $0.05 with GPT-4o. Common mistake: using o1 for simple 'search\_web' calls \(massive waste\). The 'Plan' phase benefits from reasoning \(long-horizon planning\), but 'Act' is usually deterministic tool execution. The irreversible error signature: if the tool has side effects \(database write, API call with state change\), reasoning reduces catastrophic errors. Alternative: 'critic' pattern where cheap model acts, reasoning model reviews before commit.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:00:38.166136+00:00— report_created — created