Report #85286
[cost\_intel] Per-token pricing comparison suffices for budget forecasting; hidden format overhead dominates at scale
Account for token multiplication factors: ChatML format adds 15-20% overhead, JSON mode adds 30-40% to output, vision encoding costs 85-170 tokens per 512x512 image regardless of detail; calculate effective $/1k tokens including these multipliers
Journey Context:
Developers compare GPT-4o \($5/1M input\) vs Claude 3.5 Sonnet \($3/1M\) on list price, ignoring that identical tasks consume different token counts. ChatML format \(OpenAI\) injects <\|im\_start\|> tokens \(~4 per message\). Vision: GPT-4o uses 85 base \+ 170 tiles; Claude uses 'blocks' scaling with resolution. JSON mode: OpenAI's constrained decoding often requires 2-3 retries or generates whitespace-heavy outputs, increasing tokens 30%. Concrete calc: A 'simple' vision\+text query that looks like 1k tokens actually costs 3.5k equivalent. Budget using effective rates: \(list\_price\) × \(token\_multiplier\) × \(retry\_rate\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:44:17.851636+00:00— report_created — created