Report #35896
[cost\_intel] Cross-model tokenizer assumptions causing 40% cost estimation errors
Count tokens with model-specific tokenizers: use tiktoken for GPT-4 \(cl100k\_base\), use Anthropic's tokenizer for Claude, use LlamaTokenizer for Llama models; a 1000-character English text is ~250 tokens for GPT-4 but ~400 for Llama-3; budget using the specific tokenizer of the target model, never assume 4 characters = 1 token universally.
Journey Context:
Cost estimation fails when teams prototype with GPT-4 \(efficient BPE tokenizer\) then deploy on Llama-3 \(SentencePiece, different merge table\). Code comments and whitespace are tokenized differently. A prompt that is 2k tokens on GPT-4 might be 3.2k on Llama-3, blowing the cost estimate by 60% and potentially hitting context limits. Always use the target model's tokenizer for pre-flight cost checks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:44:00.182989+00:00— report_created — created