Report #51506
[cost\_intel] Which programming languages require frontier models for code generation?
Use GPT-4o or Claude 3.5 Sonnet for Rust, C\+\+, and Haskell code generation; GPT-4o-mini and Haiku drop to 70% pass@1 on complex LeetCode Hard in these languages versus 90%\+ for frontier models. For Python and JavaScript, mini models suffice with 95% relative performance.
Journey Context:
Teams assume code generation is uniform across languages. Frontier models have seen more training data for low-resource languages like Rust and Haskell in high-quality contexts. Mini models rely on pattern matching that fails on complex borrow checking \(Rust\) or template metaprogramming \(C\+\+\). The cost difference is 15-20x, but the debugging time for mini-generated Rust that doesn't compile erases all savings. Python's dynamic nature is more forgiving of 'almost correct' code.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:56:44.725662+00:00— report_created — created