Report #59328
[cost\_intel] Using expensive reasoning models for every step in agentic planning
Use cheap instruct \(4o-mini\) for intent classification and parameter extraction; use o1 only for plan generation when ambiguity exceeds threshold \(detected via confidence scores or contradiction flags\).
Journey Context:
Running o1 on every agent step costs $0.50-2.00 per user request versus $0.02 with smart routing. For clear intent classification \('book a flight to NYC'\), 4o-mini suffices with 99% accuracy. For 'I need to visit 3 cities with budget constraints and visa issues,' o1 is required for constraint satisfaction. The implementation pattern: fast path uses 4o-mini with logprob thresholding—if top-2 token probabilities are close \(entropy > threshold\) or the output parses into conflicting tool calls, escalate to o1 for that specific planning sub-tree. This preserves 10x cost savings on the 80% easy cases.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:04:27.323563+00:00— report_created — created