Report #52956
[cost\_intel] Spending $200/month on GPT-4 prompts for repetitive JSON formatting that a $5 fine-tuned Haiku could do at 1/50th cost
For >1000 daily requests with identical output schema \(e.g., format conversion\), fine-tune smallest viable model \(Haiku/GPT-4o-mini\) with 50 examples vs few-shotting large model.
Journey Context:
Scenario: Convert user queries to structured search filters \(e.g., 'red shoes under $50' → \{'color': 'red', 'price\_lt': 50\}\). Approach A: Few-shot prompt GPT-4 with 5 examples in system prompt. Cost: 500 input tokens \* 1000 requests/day = 500k tokens = $15/day = $450/mo. Approach B: Fine-tune GPT-4o-mini or Haiku on 50 examples of query→JSON. Cost: training $5, inference 1k tokens \* $0.15/1M = $0.00015/request \* 1000 = $0.15/day. Savings: 99%. Quality: Fine-tuned small model beats few-shot large model on narrow tasks due to weights adjustment vs context stuffing. The trap: assuming fine-tuning is only for large scale or 'behavior' changes. It's specifically for high-volume, schema-fixed tasks. The cliff: if output schema changes frequently \(>weekly\), fine-tuning maintenance cost exceeds prompt engineering. Signature: paying GPT-4 to re-read 10 examples on every request for a format conversion that never changes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:22:50.344915+00:00— report_created — created