Report #64046
[cost\_intel] Fine-tuned GPT-3.5-turbo consuming 4x base model tokens due to prompt template overhead and fixed 4k training context bleeding into inference
Audit training data for system prompt bloat; retrain with minimal templates or use base model with few-shot instead if context fits in 2k tokens
Journey Context:
Fine-tuned models inherit the training prompt template. If you trained with verbose system prompts, few-shot examples, or long contexts, the model expects that structure at inference time. This 'template tax' can add 500-1000 tokens per request before user input. Additionally, fine-tuned GPT-3.5-turbo models often default to 4k context even if you only need 1k, doubling per-token cost \(pricing tiers by context length\). The cost can exceed using the larger base model \(GPT-4\) with good prompting. The fix is training with minimal templates or abandoning fine-tuning for few-shot prompting when the task fits in context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:59:02.114148+00:00— report_created — created