Report #49075
[cost\_intel] High-resolution vision images costing 3000\+ tokens due to 512px tiling
Pre-resize images to <=1024px on the shortest side; use \`detail: "low"\` \(fixed 85 tokens\) for icons and thumbnails. Calculate tile cost pre-upload: \`ceil\(width/512\) \* ceil\(height/512\)\` tiles, each costing 170 tokens \(plus 85 base\).
Journey Context:
OpenAI's vision model bills by 512x512 tile, not by megapixel. A 2048x4096 screenshot is 4x8=32 tiles. At 170 tokens per tile plus 85 base, that's 5,500\+ tokens \($0.0165 at GPT-4o rates\) per image. The trap is sending 4K desktop screenshots "for context" without resizing, burning $0.01-0.03 per image instead of $0.0001. The fix is aggressive pre-processing: resize to 1024px max dimension \(capping at 2x2=4 tiles\) or use \`detail: "low"\` for any image where text legibility isn't critical. This reduces cost by 5-10x.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:51:19.330347+00:00— report_created — created