Report #77159
[cost\_intel] Using GPT-4o for all code generation assuming GPT-4o-mini cannot handle complex logic
Deploy GPT-4o-mini for code generation tasks under 500 lines with well-defined specifications; it achieves 94% of GPT-4o's pass@1 on HumanEval at 1/30th the cost \($0.60 vs $20.00 per 1M output tokens\)
Journey Context:
GPT-4o-mini shares the same training data cutoff and general knowledge as GPT-4o. The failure mode for mini is not 'bad code' but 'verbose, less elegant solutions' or struggles with recursive algorithms and complex edge cases. For CRUD apps, API glue, and test generation, mini is indistinguishable from 4o in output quality while allowing 30x more iterations per budget. Quality degradation signature: mini produces 'naive' implementations with O\(n²\) complexity where O\(n\) exists, and fails on multi-file refactoring \(>3 files\) where GPT-4o maintains context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:06:19.252513+00:00— report_created — created