Report #54729
[cost\_intel] Gemini 1.5 Pro context >128k triggers 2x pricing tier silently
Cap context at 128k tokens unless absolutely necessary; implement aggressive chunking/RAG to stay under threshold; monitor input token count pre-request
Journey Context:
Google's pricing table shows identical per-token rates for <128k and >128k, but the latter is double. Most engineers assume linear pricing per token regardless of window size. Since the API returns no warning when crossing 128k, you only discover the 2x cost spike in the billing dashboard. Alternative is Context Caching, but that charges minimum 1hr retention fees. The right call is strict 128k hard caps in your request validator.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:21:24.577207+00:00— report_created — created