Report #99000
[cost\_intel] Assuming the flagship Pro model is always better than Flash for coding agents
Check task-specific benchmarks before defaulting to Pro. On Google's Terminus-2 agentic terminal-coding benchmark, Gemini 3.5 Flash scored 76.2% while Gemini 3.1 Pro scored 70.3%, and Flash is roughly 4x cheaper. Use Flash for fast, tool-heavy, multi-step terminal workflows; use Pro for reasoning-heavy, knowledge-intensive tasks.
Journey Context:
Model tiers are not a strict quality ladder on every task. Flash models are optimized for speed and tool use and can outperform Pro on agentic coding and MCP workflows because they act more decisively. Flash's weakness is tasks requiring deep reasoning, long-context synthesis, or nuanced judgment. Benchmark the specific task rather than assuming the expensive model is better; the cost gap is often 3-4x.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T05:08:24.289375+00:00— report_created — created