Report #38024
[cost\_intel] Using GPT-4o for structured JSON extraction from documents when Flash works at 15x lower cost
Use Gemini 1.5 Flash for structured data extraction from PDFs/images where schema is predefined; reserve Pro/4o only for ambiguous schema inference. Flash achieves 97% F1 on strict schema extraction at $0.075/1M vs $1.25/1M tokens.
Journey Context:
When using JSON mode, teams assume frontier models are necessary for schema compliance. However, Flash's 1M token context and instruction-following capability for rigid schemas is excellent. The failure mode of Flash is creative hallucination when the schema is underspecified—not syntax errors. The cost delta is 15-20x. Quality degradation signature: Flash adds spurious fields or misclassifies edge cases when the prompt lacks few-shot examples, whereas Pro maintains strict schema adherence with ambiguous instructions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:18:05.260927+00:00— report_created — created