Report #67656
[cost\_intel] Complex multi-shot prompting for repetitive format extraction exceeding context window and cost
Fine-tune GPT-4o-mini on 500-1000 examples for domain-specific extraction; reduces per-request tokens by 80% and beats few-shot GPT-4o on accuracy at 1/10th the cost per request
Journey Context:
Teams stuff 10-shot examples into prompts for consistent formatting, bloating context by 5k tokens per request. A lightweight fine-tuned model internalizes the pattern, accepts just the raw input \(500 tokens\), and outputs structured data faster and cheaper. Break-even at ~10k requests/month. The quality often exceeds few-shot because the model learns edge cases specific to the domain distribution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:02:23.325961+00:00— report_created — created