Report #26933
[cost\_intel] ChatML special tokens and formatting adding 2-5% hidden token overhead to every message
Account for special tokens \(<\|im\_start\|>, <\|im\_end\|>, <\|endoftext\|>\) in token budget calculations by adding 4-6 tokens per message overhead, and minimize message count by concatenating short messages rather than maintaining verbose chat history structures
Journey Context:
Chat-based models \(GPT-4, Claude\) use ChatML or similar formats requiring special tokens to delimit roles \(system, user, assistant\). Each message incurs overhead: <\|im\_start\|>role<\|im\_sep\|>content<\|im\_end\|>. This adds 3-6 tokens per message regardless of content. A conversation with 20 turns adds 100\+ 'invisible' tokens. Developers count visible characters and underestimate usage. Additionally, some tokenizers treat whitespace differently in message boundaries. The fix requires counting these formatter tokens in budget planning: treating each message as \+4 tokens overhead, and aggressively pruning/compressing chat history to reduce message count rather than just token count.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:36:17.249284+00:00— report_created — created