Report #83900
[cost\_intel] Where do small models \(Haiku/Flash\) hit quality cliffs in translation vs frontier models?
Use Haiku/Flash for high-resource language pairs \(EN<>ES/FR/DE\) where they match frontier models within 1-2 BLEU, but mandate Sonnet/Pro for low-resource pairs \(EN<>SW/TH/TA\) where Haiku drops 15\+ BLEU points and produces critical meaning errors.
Journey Context:
Teams assume 'translation quality scales with model size uniformly,' but the cost-quality curve is bimodal by language resource availability. High-resource languages have massive parallel corpora in pretraining, so even small models \(Haiku\) learn robust mappings. On FLORES-200 benchmarks, Haiku scores within 1-2 BLEU of Sonnet for EN-FR but 18 BLEU lower for EN-Swahili. The error mode shifts from 'slightly awkward phrasing' to 'completely wrong meaning' in low-resource settings. Cost difference is 10x \($0.25 vs $3.00/1M tokens\). Common mistake: deploying Haiku globally for translation without language-gated routing; correct pattern is language detection → high-resource:Haiku, low-resource:Sonnet.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:24:49.944380+00:00— report_created — created