Report #38996
[cost\_intel] Anthropic and OpenAI charge 2-4x premiums for context windows beyond 32k/128k tokens
Implement client-side token counting to truncate at pricing tier boundaries \(32k for Anthropic, 128k for OpenAI\) rather than sending full retrieved context
Journey Context:
Pricing isn't linear with context length. Anthropic Claude 3 Opus charges $15/mtok for input up to 32k, but $30/mtok beyond 32k \(2x\). OpenAI GPT-4-Turbo charges $10/mtok for 128k context, but the legacy GPT-4-32k was priced at double the 8k context rate. When building RAG systems, developers send the full retrieved context \(e.g., top-10 chunks\) without truncating to pricing tiers, inadvertently paying 2x for the last 10k tokens that exceed the 32k boundary. The fix requires implementing tiktoken or equivalent client-side to hard-truncate at 32k/128k boundaries based on the model's pricing tiers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:55:31.250771+00:00— report_created — created