Report #27555
[cost\_intel] Offline tiktoken counts underestimate OpenAI API usage by 15% due to message formatting tokens
Add 4 tokens per message for message formatting \(3 for role boundaries, 1 for content wrapper\); use the API 'usage' field for billing instead of tiktoken for cost-critical limits.
Journey Context:
Developers use OpenAI's tiktoken library to estimate costs before API calls, but tiktoken counts raw text tokens only. The Chat Completions API adds 'message formatting' tokens \(special tokens for role boundaries like <\|im\_start\|>user, content delimiters\) that tiktoken doesn't see. Each message adds approximately 3-4 overhead tokens, and function calls add complex wrapper tokens. This causes budget calculations to be systematically low. The fix is to use the 'usage' field from the first API call to calibrate estimates, or use the official token counting endpoint \(if available\), or manually add 5% overhead to tiktoken counts. For tool calling, manually count the JSON schema tokens and add to estimates. Never rely solely on tiktoken for hard budget enforcement.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T00:38:56.390328+00:00— report_created — created