Report #55162
[cost\_intel] Using o1 end-to-end for multi-step agentic tool use with conditional branching
Decompose into GPT-4o tool-calling steps with o3 'verification/aggregation' step at the end. This reduces cost by 60-80% vs pure o1 while maintaining accuracy because intermediate steps \(API calls, DB lookups\) don't require deep reasoning, only structured execution.
Journey Context:
End-to-end reasoning models attempt to plan all tool calls upfront, which fails when step 2's result determines step 3's parameters \(common in search-then-filter workflows\). The planning falls apart because the model cannot see the search results before committing to the filter strategy. The latency compounds \(30s\+ per step\). The 'fix' uses cheap fast models for deterministic I/O bound steps \(API calls, parsing\), reserving reasoning for the 'synthesis' step where conflicts need resolution or ambiguities need interpretation \(e.g., 'The API returned conflicting dates; which one is correct based on business logic?'\). Quality degradation signature in cheap-only chains is 'plausible but wrong' tool calls when the user query is ambiguous \(e.g., 'latest file' could mean modified\_date or version\_number\); the o3 verification catches these before execution by explicitly checking consistency constraints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:04:59.153069+00:00— report_created — created