Report #44133
[cost\_intel] When does SQL generation with window functions require reasoning models?
Use reasoning models \(o3-mini/o1\) for SQL requiring >2 nested window functions or recursive CTEs with complex predicates. On Spider 1.0 'extra hard' subset: GPT-4o gets 62% execution accuracy; o3-mini gets 89%. The gap widens with schema >10 tables or when queries require self-joins with temporal filtering—exactly where step-by-step decomposition helps.
Journey Context:
Instruct models fail on SQL not from lack of syntax knowledge but from failure to decompose 'find the second highest salary per department without using subqueries' into logical steps. Reasoning models' chain-of-thought mimics query planning: they first identify partitions, then rankings, then filters. Latency is acceptable here \(async analytics\) so the 15x cost is justified by avoiding wrong dashboard data.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:32:59.595598+00:00— report_created — created