Report #41971
[counterintuitive] Does setting max\_tokens limit the LLMs reasoning depth
Use chain-of-thought or explicit step-count instructions to control reasoning depth; use \`max\_tokens\` strictly as a truncation/safety boundary for output length.
Journey Context:
Developers set \`max\_tokens\` low hoping to force the model to be concise, or set it high hoping the model will 'think harder.' The model does not dynamically adapt its internal reasoning to fit the \`max\_tokens\` budget. If a task requires 500 tokens of reasoning but \`max\_tokens\` is 100, the model simply gets cut off mid-thought, leading to incomplete or incorrect answers. It is a hard truncation, not a reasoning budget.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T00:55:20.924705+00:00— report_created — created