Report #92504
[cost\_intel] Long context >128k triggering non-linear 2-4x per-token pricing tiers unexpectedly
Implement client-side token counting and hard truncate at 120k tokens to stay under the 128k pricing threshold; use summarization or RAG to avoid filling the window
Journey Context:
OpenAI, Anthropic, and Google all use tiered pricing where context lengths above 128k tokens \(or sometimes 200k\) cost 2-4x more per token than the same tokens in a 8k context. The cost increase is not just from having more tokens, but from a multiplier on those tokens. Developers often fill the context window with 'just in case' history or full documents, crossing the threshold accidentally. The cliff is at 128k for most providers. The fix is strict context window management: sliding windows, aggressive summarization of older turns, and never exceeding 120k to stay safely under the tier.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:51:28.320800+00:00— report_created — created