Report #72112
[cost\_intel] Using Claude 3.5 Sonnet for high-volume binary classification and entity tagging, causing 10x cost overhead for marginal accuracy gains
Deploy Claude 3 Haiku or Gemini Flash for classification tasks with <5% context window usage; validate with a 1k sample holdout using exact-match F1. Fallback to Sonnet only if F1 delta > 0.03
Journey Context:
Sonnet's reasoning is wasted on deterministic pattern matching. Haiku/Flash match Sonnet on MMLU subsets involving extraction and classification \(within 2-3%\). The failure mode is long-context reasoning across chunks; if your task fits in 4k tokens, cheap models suffice. People over-provision because 'it's critical'—measure first
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:37:28.949453+00:00— report_created — created