Report #41607
[cost\_intel] Using Gemini 1.5 Pro for video timestamp extraction and scene classification
Use Gemini 1.5 Flash for video tasks under 30 minutes requiring single-frame understanding; it delivers 95% of Pro's accuracy at 1/20th the cost \($0.07 vs $1.25 per 1M tokens\)
Journey Context:
Flash and Pro share the same context window \(1M tokens\) but Flash uses a more aggressively distilled MoE. For tasks like 'What happens at 00:05:23?' or 'Classify this scene', Flash achieves ~95% of Pro's accuracy on Video-MME benchmarks. The failure mode is cross-scene temporal reasoning \(e.g., 'How does the ending relate to the opening scene?'\) where Flash drops to ~80% accuracy. For content moderation pipelines processing user uploads, Flash reduces costs from $15k/day to $750/day while maintaining 98% precision on object detection tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T00:18:28.345296+00:00— report_created — created