Report #95565
[cost\_intel] End-to-end reasoning models for simple tool calling \(search, read, calc\) introduce 10x latency for no accuracy gain
Use function-calling instruct models \(GPT-4o, Claude 3.5 Sonnet\) for 1-2 step workflows; reserve o1/o3 for >5 step planning with conditional logic
Journey Context:
Simple RAG: cheap model accuracy 95%, o1 96%, but o1 costs 30x more and takes 15s vs 2s. The cliff emerges in 'open-ended research' tasks requiring backtracking \(e.g., 'if search fails, try alternative source, then synthesize'\). Cheap models lock into first viable option; reasoning models explore trade-offs. Failure signature: cheap model produces answer that ignores a hard constraint mentioned late in context. Pattern: use cheap model with ReAct loop for steps 1-3; if confidence < threshold \(logprob <-0.5\), escalate to reasoning model with full history.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:59:02.551763+00:00— report_created — created