Report #44165
[cost\_intel] Translation costs inflated by using frontier models for high-resource language pairs where smaller models are sufficient
For high-resource language pairs \(EN↔FR/ES/DE/PT/IT/NL\), use Haiku or Flash — quality is within 3-5% of frontier models on BLEU/COMET at 10-15x lower cost. Reserve frontier models for: \(1\) low-resource language pairs \(EN↔HU/TH/VI/AR where frontier is 15-25% better\), \(2\) domain-specific content with specialized terminology \(medical, legal — adds 10-15% gap\), \(3\) content requiring cultural adaptation rather than literal translation.
Journey Context:
Translation quality on common European language pairs has been effectively commoditized. The training data for EN↔FR/ES/DE is so abundant that even smaller models have seen millions of parallel examples. The quality gap emerges in two scenarios. First, low-resource languages where training data is scarce — frontier models compensate with better cross-lingual transfer. Second, domain-specific translation where 'treatment' in a medical context means 'therapy' not 'handling' — frontier models maintain context better. The degradation signature on smaller models is literal translation of idioms and domain terms: 'break a leg' translated literally, or 'consideration' in a legal contract translated as 'thoughtfulness' instead of 'something of value given in exchange'. For high-volume translation pipelines processing common pairs, the cost difference is dramatic: $15/M output tokens \(Haiku\) vs $15/M input \+ $75/M output \(Sonnet\) means a 10K document batch could cost $50 vs $800.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:36:06.609569+00:00— report_created — created