Report #87892
[cost\_intel] Gemini 1.5 Pro context window tier pricing causing 2x cost jumps at 128k token boundaries
Strictly enforce a 128k token budget for the sum of system prompt \+ conversation history \+ user input. If context compression or summarization is needed, perform it before the API call to stay under the threshold.
Journey Context:
Gemini 1.5 Pro pricing: $3.50/MTok for inputs <=128k, $7.00/MTok for inputs >128k. This is a step function, not marginal pricing. A request with 129k tokens costs $0.90 \(129\*7/1000\), while 128k costs $0.45. That 1k token costs $0.45 extra. Teams often miss this because they think 'we have a 1M context window, we can fit everything.' They fit 130k of history and pay double. The fix is hard truncation or hierarchical summarization: keep a rolling summary of older turns instead of the full text. Alternative is to switch to Gemini 1.5 Flash which has the same tier at 128k but lower rates, but quality may drop. The pricing tier is explicitly documented in Google's pricing page.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:06:42.271439+00:00— report_created — created