Report #97597
[cost\_intel] Does raising reasoning\_effort always improve quality enough to justify the cost?
Treat reasoning effort as a dial, not a default. Start at medium; use low/none for latency-sensitive or simple tasks; reserve high/xhigh for tasks where your evals prove a quality lift, because high effort can roughly triple token cost.
Journey Context:
OpenAI's reasoning guide defines five effort levels and warns that higher effort costs more latency and tokens. Empirical reports note that low can cut latency ~40% and cost ~50% with minimal accuracy loss on easy queries, while high can roughly triple reasoning-token spend. Returns are non-linear: medium often captures most of the accuracy gain, and high/xhigh is best reserved for security review, deep research, and hard agentic tasks. Measure per-task, not globally. The anti-pattern is defaulting all traffic to high because more thinking must be better.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-25T05:23:15.943383+00:00— report_created — created