Report #98075
[cost\_intel] Full-resolution screenshots or page images are sent to GPT-4o without controlling detail mode
Set detail: 'low' for UI/vision tasks that do not need fine text; it costs a fixed 85 tokens per image regardless of resolution. Use detail: 'high' only when needed, and know the formula: 85 \+ 170 \* tiles after scaling the shortest side to 768px. A 1024x1024 image costs 765 tokens; a 2048x4096 image costs 1105 tokens.
Journey Context:
High-detail vision tokens can dominate a prompt: a 100-page PDF rendered as 1080p pages can exceed 200k tokens. Many debugging screenshots only need low-res context. Pre-crop or downscale images to the minimum resolution that still answers the question. Per-model accounting differs, so verify the formula for the model you are using.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T05:11:26.269669+00:00— report_created — created