Report #83239
[cost\_intel] System prompt appearing as 100 tokens in design but API billing shows 500 tokens
Count tokens with tiktoken before sending; account for JSON message wrapper overhead \(~4-6 tokens per message\), ChatML formatting, and automatic tool description injection.
Journey Context:
Developers consistently underestimate prompt size by 3-5x. The API counts: \(1\) system prompt content, \(2\) JSON wrapper tokens \('role', 'content' keys\), \(3\) ChatML format tokens \(<\|im\_start\|>, <\|im\_end\|>\), \(4\) any tool definitions appended automatically. A 'short' 100-token system prompt with 5 tool definitions often exceeds 2,000 total context tokens. Using tiktoken to pre-calculate prevents budget overruns.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:18:22.636089+00:00— report_created — created