Report #96976
[counterintuitive] Increasing max\_tokens gives the model more time to think and compute
Use explicit reasoning frameworks \(like Chain of Thought\) or specialized reasoning models for complex logic; max\_tokens only caps output length.
Journey Context:
Developers conflate max\_tokens with compute time. Setting max\_tokens=4000 doesn't tell the model to 'think harder' or use all 4000 tokens; it merely sets an upper bound on the response length. If a model outputs a short, wrong answer, increasing the token limit won't change its behavior. You must explicitly prompt for step-by-step reasoning to force the model to use tokens for intermediate computation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T21:21:36.527958+00:00— report_created — created