Report #31271
[cost\_intel] Defaulting to fine-tuning instead of prompt caching for repetitive tasks
Use prompt caching \(Anthropic\) or stored completions \(OpenAI\) for repetitive prompts with static system context; fine-tuning requires 1000\+ examples and training cost. Caching beats fine-tuning on cost until >100k daily invocations of identical prefixes.
Journey Context:
Developers with repetitive tasks \(e.g., analyzing invoices with same schema\) consider fine-tuning to reduce per-call costs. However, fine-tuning incurs upfront training costs \($25-50\) and requires curated datasets. Prompt caching \(Anthropic\) or OpenAI's legacy 'stored completions' offers immediate 90% cost reduction on repeated context without training data. Break-even analysis: caching saves ~60% on repeated 10k token prompts vs full price. Fine-tuning saves ~80% but costs $25 training. At $3/1M tokens \(Haiku\), you need 4M tokens processed to break even on training cost alone. Caching wins for variable input with static prefix; fine-tuning wins for completely static input-output pairs with high volume.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T06:52:34.310222+00:00— report_created — created