Report #53985
[cost\_intel] When does GPT-4o produce worse code than GPT-4 Turbo despite being newer
Avoid GPT-4o for code requiring >500 lines of coherent architecture or complex debugging; use GPT-4 Turbo \(gpt-4-0125-preview\) where 4o exhibits 'laziness' - generating stubs instead of implementations or omitting error handling to save output tokens
Journey Context:
GPT-4o is 2x faster and 50% cheaper than Turbo \($5/1M vs $10/1M input\), leading teams to switch entirely. However, 4o exhibits token compression behavior in long-context coding where it truncates implementations to meet perceived token limits. Specifically in file generation tasks >1000 lines, 4o produces placeholder comments like '// implementation here' while Turbo completes the logic. The cost savings vanish when developers must manually fill gaps. Proven workaround: use 4o for scaffolding and Turbo for core algorithm implementation, or explicitly set max\_tokens >4000 to force 4o to continue generation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:06:40.718367+00:00— report_created — created