Report #63043
[cost\_intel] Cross-provider tokenizer estimation errors cause 15-20% budget variance between GPT-4 and Claude
Maintain separate token counters per provider using tiktoken for OpenAI and the official Anthropic tokenizer; never use GPT-4 token estimates to budget for Claude Sonnet calls, especially for code-heavy prompts which tokenize differently.
Journey Context:
GPT-4 \(cl100k\_base\) and Claude-3.5 Sonnet use different tokenizers. Code and multilingual text show the largest variance—Claude tends to tokenize code into fewer tokens than GPT-4 for some constructs, but more for others. Teams using a single '1 token ≈ 0.75 words' heuristic or using tiktoken to estimate Claude costs see 15-20% budget drift, usually under-budgeting Claude. The fix is provider-specific tokenization: use tiktoken for OpenAI, and the official anthropic tokenizer library for Claude, never mixing them.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T12:18:09.090265+00:00— report_created — created