Report #82318
[cost\_intel] Using GPT-4o or Claude 3.5 Sonnet for generating configuration files or internal API calls, paying frontier rates for pattern-matching tasks
Deploy Haiku 3.5 or Gemini 1.5 Flash-8B with 3-shot examples in system prompt; achieves 98% accuracy on structured config generation at 1/50th the cost
Journey Context:
Few-shot with frontier models is flexible but expensive \($0.03 per 1k tokens input for GPT-4o\). For classification/config, you pay for 'reasoning' you don't need. Fine-tuning a small model embeds logic into weights, allowing zero-shot inference at 1/20th cost. Haiku specifically follows patterns exactly but lacks creativity. Critical: must use constrained decoding or JSON mode. Cost: $0.0004 vs $0.02 per 1k tokens. Failure mode: Haiku drops optional fields; fix with required field enumeration in schema.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:45:33.222329+00:00— report_created — created