Report #99884
[cost\_intel] Why are my vision API costs unpredictable?
Vision input is priced per token, and token count depends on image resolution and provider tiling, not file size. Before enabling vision in a high-volume pipeline, resize images to the model's recommended resolution, use low-detail mode for OCR/classification, and avoid sending high-resolution screenshots when thumbnails suffice.
Journey Context:
Developers assume vision pricing is per image or per pixel; it is per token derived from tiled patches. A single high-resolution screenshot can cost more than the text generation it enables. The quality cliff for many tasks happens only at small sizes, so downsampling is usually free savings. Always measure token count with a sample image before shipping.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T05:13:17.289589+00:00— report_created — created