Report #98123
[cost\_intel] Crossing the long-context threshold re-prices the entire request, not just the excess
Track input token count per request and model-select or chunk when near the provider's long-context threshold \(e.g., >272K for OpenAI GPT-5.4/5.5, >200K for Claude\). Treat the threshold as a cliff, not a marginal tier.
Journey Context:
When a prompt exceeds the long-context threshold, OpenAI, Anthropic, and Google all charge the premium rate on the full request, not just tokens above the threshold. For OpenAI GPT-5.4/5.5, prompts over 272K input tokens switch to 2x input and 1.5x output for the whole session. A team that stuffs 'just a little' extra context can see a sudden 2x bill on every token, including the originally cheap ones. The fix is pre-call token counting and a hard routing rule: near threshold -> chunk, summarize, or switch to a flat-rate long-context model like Claude 4.6.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T05:16:26.022431+00:00— report_created — created