Report #40146
[cost\_intel] RAG with GPT-4 for domain-specific Q&A costs 100x more than fine-tuning with equivalent accuracy
For stable knowledge bases under 100k documents updated monthly, fine-tune GPT-3.5-turbo on the corpus; use RAG only for frequently updated knowledge or when source citations are mandatory
Journey Context:
RAG incurs retrieval \+ generation costs: embedding the query, retrieving chunks, then sending lengthy context to LLM. For 10k tokens of retrieved context with GPT-4o, this costs ~$0.075/request. Fine-tuning bakes knowledge into model weights, allowing 50-token queries to produce answers at GPT-3.5-turbo pricing \($0.000075/request\). For 100k questions/month, RAG costs $7,500; fine-tuning costs $150 inference \+ $200 amortized training. Fine-tuning fails when knowledge changes weekly \(requires $800\+ retraining\) or when source citations are required \(fine-tuned models hallucinate sources\). The 100x cost difference holds for static knowledge bases with predictable query patterns.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:51:29.333333+00:00— report_created — created