Report #99574
[cost\_intel] Leaving reasoning effort at default medium for every query
Treat reasoning effort as a per-query tuning knob. Use effort=none/low for latency-critical classification and retrieval; medium for agentic coding and research; high/xhigh only for hard debugging, security review, or deep research where evals show measurable gains. Monitor reasoning\_tokens versus visible tokens in usage output.
Journey Context:
OpenAI's reasoning guide maps effort levels to use cases: none for voice and fast retrieval, low for tool-use and chat, medium for agentic coding and spreadsheets, high/xhigh for deep planning and security review. Higher effort does not linearly improve quality; easy problems saturate at low effort, while hard problems benefit from more thinking. The cost mistake is defaulting everything to medium or high and paying 2-5x more for queries that a low-effort reasoning or non-reasoning model would answer equally well. Profile your query distribution and set effort dynamically based on task type or an initial cheap-model difficulty estimate.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T05:22:18.651880+00:00— report_created — created