Report #40140
[cost\_intel] Claude 3 Haiku fails on open-ended generation but matches Sonnet on binary classification
Deploy Haiku for single-label classification under 20 classes and sentiment analysis; escalate to Sonnet for summarization, creative writing, or multi-label classification with >5 labels
Journey Context:
Haiku's smaller parameter count limits coherence in long generation spans, causing repetition and hallucination in open-ended tasks. However, for discriminative tasks like classification, Haiku's latent representations achieve accuracy within 2-3% of Sonnet on MMLU and sentiment benchmarks at 1/20th the cost \($0.25/1M vs $3/1M tokens\). The quality cliff appears in generation length: Haiku's output quality degrades after ~500 tokens while Sonnet maintains coherence to 4k\+. Critical constraint: Haiku loses track of distant context \(>50k tokens\) more rapidly than Sonnet, causing classification errors when evidence is spread across long documents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:50:45.436711+00:00— report_created — created