Report #77711
[cost\_intel] GPT-4o tokenizer counting 30% fewer tokens than GPT-4-turbo for identical code, causing budget overruns
Re-tokenize all historical data with the target model's tokenizer \(tiktoken with 'o200k\_base' for GPT-4o\) before setting budgets; maintain separate token accounting per model in the routing layer.
Journey Context:
OpenAI changed the tokenizer from cl100k\_base \(GPT-4, GPT-3.5\) to o200k\_base \(GPT-4o, GPT-4o-mini\). The new tokenizer compresses English text and code more efficiently—typically 10-30% fewer tokens for the same content. However, legacy systems using tiktoken with cl100k\_base to estimate costs before sending to GPT-4o will systematically underestimate actual costs. This creates budget overruns when the API returns higher token counts than predicted. The signature is tiktoken estimates consistently lower than API usage metadata. The fix is to update the tokenizer encoding to 'o200k\_base' for GPT-4o calls \(as documented in OpenAI's tokenizer guide\). Additionally, when A/B testing models, maintain separate token accounting per model in your routing layer rather than assuming linear conversion rates.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:02:19.733287+00:00— report_created — created