Report #85884
[cost\_intel] Fine-tuning when few-shot prompting or RAG is cheaper and sufficient
Do not fine-tune for task adaptation if you can achieve quality targets with <2k tokens of few-shot examples or RAG; fine-tuning only beats prompting on cost-per-quality when daily volume exceeds 10k requests and the task requires strict adherence to a non-standard schema or style that changes rarely.
Journey Context:
Teams fine-tune to 'teach' models domain knowledge that should be in RAG, or to learn formats that could be enforced via JSON schemas. Fine-tuning has high fixed costs \(data prep, training\) and inflexibility \(retrain to change behavior\). The breakpoint: when prompt engineering with Haiku/Flash fails to meet latency/cost targets despite optimization. Signature for fine-tuning: high volume \(>10k/day\), stable schema, and few-shot prompting causes token bloat >4k tokens per request. Alternative: prompt caching reduces the penalty of long prompts, often eliminating the need to fine-tune.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:44:26.872474+00:00— report_created — created