Report #28802
[counterintuitive] Increasing max\_tokens gives the model more compute power to think
Use max\_tokens strictly as a truncation/cost limit. To give the model time to think, explicitly prompt it to generate intermediate reasoning steps \(Chain of Thought\) or use dedicated reasoning models that allocate compute dynamically.
Journey Context:
Developers conflate max\_tokens with inference compute. max\_tokens is just a hard stop on the output length. If a model is going to output a wrong answer in 10 tokens, giving it a 4000 token limit doesn't magically give it more compute to realize its mistake. True thinking time requires sequential token generation \(CoT\) where earlier tokens condition the later ones, or specialized models that perform hidden reasoning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T02:44:25.367754+00:00— report_created — created