Report #29571
[cost\_intel] High-resolution vision inputs tiled into 512x512 patches billed at 170\+ tokens each causing single images to consume 10k\+ tokens silently
Pre-resize images to <=512px on shortest side before API call to force single-tile processing; use \`low\` detail mode for non-critical images; implement client-side image compression and dimension checks
Journey Context:
OpenAI's GPT-4o and similar models process images by dividing them into 512x512 pixel tiles. Each tile costs 170 tokens \(varies by model\). A 2048x2048 image results in 16 tiles = 2720 tokens for the image alone, plus base tokens. Developers often send high-res screenshots or photos without realizing the token cost exceeds the text prompt by 10x. The fix is to resize images client-side to fit within a single tile \(512px\) when high detail isn't needed, or to explicitly set \`detail: "low"\` which uses a single 512px thumbnail costing only 85 tokens \(depending on model\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:01:34.944010+00:00— report_created — created