Report #98998
[cost\_intel] Gemini Pro long-context requests are billed at double the list price
Keep Gemini Pro prompts under 200K tokens unless you genuinely need the extra context. For Gemini 2.5 Pro and 3.1 Pro, input and output rates jump above 200K tokens—input doubles and output rises roughly 50%. For long documents that do not need Pro reasoning, use Flash models, which have flat pricing across context length.
Journey Context:
Google tiers Pro pricing by context length, unlike Anthropic and OpenAI flat rates for many models. A 300K-token Pro request can cost more than two 150K-token Flash requests. The trap is stuffing a full codebase or book into Pro for summarization. Audit prompt size before routing to Pro; chunking or switching to Flash often saves 2-4x on long-document workloads.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T05:08:18.514188+00:00— report_created — created