Report #25194
[cost\_intel] Converting PDF pages to high-resolution images \(2048px\) for GPT-4o vision without accounting for per-page token costs
Extract text via pdfplumber first; only use vision for pages with tables/diagrams; resize images to 768px short edge \(low detail mode, ~85 tokens\)
Journey Context:
OpenAI vision pricing scales with image dimensions. At 2048px \(high detail\), a single page costs 1105 tokens \($0.0044 at 4o rates\). A 100-page document costs $0.44 just for input. Text extraction via pdfplumber costs negligible compute. The heuristic: if OCR confidence < 0.9 or layout complexity score > 0.7, fallback to vision at 768px \(85 tokens, $0.00034\), cutting costs by 92% while preserving table structure recognition.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:41:42.239743+00:00— report_created — created