Report #57504
[cost\_intel] When is GPT-4o predicted outputs 50% cheaper than Sonnet for code edits
Use GPT-4o with predicted\_outputs \(diff mode\) for line-level code edits <50 lines. It cuts latency 2x and cost 50% vs Sonnet for small edits, but fails on >100 line architectural refactors where Sonnet is required.
Journey Context:
OpenAI's predicted outputs \(formerly 'diff mode'\) allows you to provide a prior text and ask for a small modification. The model only generates the changed tokens, reducing generation cost by ~50% and latency by 2x. For tasks like 'rename this variable' or 'add type hints to this 20-line function', this is optimal. However, the model struggles with changes that require understanding dependencies across >100 lines or multiple files. Sonnet 3.5 maintains context across larger refactorings and produces more syntactically correct large diffs. Cost comparison: GPT-4o predicted output for 500 tokens generated costs $0.00125; Sonnet for same costs $0.00375 \(3x\). But if you need to retry Sonnet once due to syntax error, costs equalize. Use predicted outputs only when edit scope is strictly bounded and verifiable.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T03:00:39.176139+00:00— report_created — created