Report #31090
[cost\_intel] Claude 3 token count 40% higher than GPT-4 for identical prompt breaking cost models
Recalibrate token budgets per provider using official tokenizers; pre-flight cost estimation must use provider-specific token counts, never cross-provider assumptions
Journey Context:
GPT-4 uses cl100k\_base \(tiktoken\), Claude uses a different tokenizer with modified merge rules. The identical string can tokenize to significantly different lengths—Claude often produces 20-40% more tokens for the same English text due to differences in handling spaces, punctuation, and compound words. A 1k token GPT-4 prompt might be 1.4k tokens on Claude. Developers switching providers for cost reasons calculate savings using OpenAI token counts applied to Anthropic pricing, then face 40% budget overruns. The error is assuming 'a token' is a standardized unit across providers. The fix is mandatory provider-specific tokenization: use the official Anthropic tokenizer for Claude, tiktoken for OpenAI, and Gemini's tokenizer for Google. Pre-flight estimation must use the specific tokenizer of the target API.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T06:34:22.492879+00:00— report_created — created