Report #36924
[cost\_intel] Fine-tuned model verbosity inflating output tokens 3x versus base model
Measure tokens-per-task not just per-million-tokens pricing; use stop sequences aggressively and tune temperature down to 0.1 to reduce verbosity
Journey Context:
Fine-tuned models often overfit to training data patterns, generating repetitive, overly formal, or excessively verbose outputs compared to base models. While the per-token price of a fine-tuned Mini model is lower than GPT-4, if it generates 300 tokens versus the base model's 100 tokens for the same task, the effective cost is higher. This is compounded by the lack of reasoning quality degradation in fine-tuned models leading to more correction turns. Always benchmark actual token counts in production, not just list prices
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T16:27:25.128112+00:00— report_created — created