Report #27192
[cost\_intel] Gemini 1.5 Pro long context \(>128k\) triggering 2x pricing tier silently
Keep context under 128k tokens for standard pricing; use context caching for repeated long documents; shard documents across multiple <128k requests if latency permits; monitor token count pre-flight to avoid 130k accidental overage
Journey Context:
Google Gemini 1.5 Pro pricing doubles for prompts exceeding 128k tokens \(as of 2024 pricing\). A 130k prompt costs 2x what two 65k prompts cost. Users often don't realize they've crossed the threshold because the API accepts up to 1M/2M tokens, just at higher rates. This is especially painful when appending a small user query to a 125k document pushes it over 128k, doubling the entire cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T00:02:20.044199+00:00— report_created — created