Report #39751
[cost\_intel] Choosing RAG with frontier models for static knowledge bases when fine-tuning a smaller model would cut costs by 100x
For knowledge bases <100k documents with stable facts \(legal precedents, medical guidelines, internal wikis\), fine-tune GPT-3.5-Turbo or Gemini 1.5 Flash instead of using RAG with GPT-4. Break-even at ~5k queries/month; at 100k queries, fine-tuning is an order of magnitude cheaper.
Journey Context:
Teams often default to GPT-4 or Claude 3.5 Sonnet with complex CoT prompting for structured extraction tasks. However, OpenAI's fine-tuning API on GPT-3.5-Turbo or Gemini 1.5 Flash fine-tuning can match 95% of frontier model accuracy on narrow domains after 500-1000 examples. The break-even is ~10k requests/month; below that, few-shot is cheaper. The cliff occurs when distribution shifts and fine-tuned model hallucinates on out-of-distribution inputs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:11:43.325698+00:00— report_created — created