Report #39000
[cost\_intel] Fine-tuned models inject 4-8 hidden special tokens per message, increasing costs 15-20% over base model for identical visible text
Use base models with few-shot prompting when context allows; account for ~20% token overhead when budgeting fine-tuned inference costs
Journey Context:
When using fine-tuned GPT-3.5 or GPT-4, the API automatically injects special tokens \(like <\|im\_start\|>, <\|im\_end\|>, <\|finetune\|>\) into the prompt to trigger the fine-tuned behavior. These aren't visible in the API response but consume tokens. A conversation with 10 messages might have 20-40 hidden special tokens added. Since pricing is per token, this adds 15-20% overhead vs the base model for identical visible text. Developers compare fine-tuned model pricing \($8/mtok for fine-tuned 3.5 vs $3/mtok for base\) and don't realize the token count is also higher. The fix is budgeting for 20% more tokens or using few-shot base prompting.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:56:16.972168+00:00— report_created — created