Report #71259
[cost\_intel] SQL generation quality drops sharply for complex queries on small models
Use Haiku/Flash for simple SQL \(single-table SELECT, basic WHERE clauses, simple JOINs\). Switch to Sonnet/Pro for window functions, CTEs, nested subqueries, and multi-JOIN queries with aggregation. The degradation signature: small models produce syntactically valid SQL that returns incorrect results — wrong JOIN conditions, missing GROUP BY clauses, or incorrect window function framing.
Journey Context:
Simple SQL generation is essentially pattern matching — small models handle it well. The cliff comes with query complexity: window functions with PARTITION BY/ORDER BY framing, CTEs that build on each other, correlated subqueries. Small models generate SQL that runs without errors but produces wrong results. This is dangerous because the output looks correct — it is a table of numbers — but the logic is wrong. In testing, small models achieve over 90% syntactic validity on complex SQL but only 50-65% semantic correctness \(returns the right answer\). Frontier models maintain over 85% semantic correctness. The cost of wrong SQL in production: incorrect reports, bad business decisions, silent data corruption. Always validate SQL output against expected results, not just syntax.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:11:20.177668+00:00— report_created — created