Report #57866
[cost\_intel] Using default 'medium' reasoning\_effort for all o1/o3 calls regardless of task complexity
Set reasoning\_effort='low' for routine debugging, code review, or structured extraction; reserve 'high' only for mathematical proofs, novel algorithms, or competition coding
Journey Context:
Reasoning effort maps directly to inference-time compute and token count \(roughly 3x between low and high\). On HumanEval and Codeforces, 'low' achieves 90-95% of 'high' accuracy at 40% of the cost. The delta only appears on frontier tasks requiring >5 novel reasoning steps. Most production tasks \(refactoring, test generation\) waste budget on 'medium' or 'high' without accuracy gains, as the tasks are bounded by context retrieval, not reasoning depth.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T03:37:07.961285+00:00— report_created — created