Report #92672
[cost\_intel] Reasoning models exhaust context window with hidden thinking tokens
Reserve 50% context buffer for reasoning models; use GPT-4o for long-document processing
Journey Context:
Reasoning models expend tokens on internal thinking chains that count toward context limits. This reduces effective window for user content by 20-50% versus instruction models, causing truncation failures on long inputs that fit fine in GPT-4o.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:08:26.621988+00:00— report_created — created