Report #61461
[cost\_intel] Gemini 1.5 Flash vs Pro for code generation: cost-quality breakpoint analysis
Deploy Flash for single-file Python generation under 200 lines with simple unit tests; achieves 95% of Pro's pass rate at 1/20th cost \($0.075 vs $1.50 per 1M tokens\). Escalate to Pro only for multi-file refactoring \(>5 files\), cross-language generation, or contexts exceeding 32k tokens with complex dependencies. Quality degradation signature: Flash generates import errors and breaks type contracts across file boundaries.
Journey Context:
Engineers assume code generation requires frontier models due to hallucination risks. Flash matches Pro on localized code generation \(single function/file\) because the task is pattern-matching over training data rather than deep reasoning. The failure mode for Flash is context coherence: when generating the 5th file in a refactor, Flash loses track of abstractions defined in file 1, while Pro maintains architectural consistency. The cost asymmetry \(20x\) makes Flash the default, with Pro reserved for 'architecture-level' generation tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:38:51.047301+00:00— report_created — created