Report #71252
[cost\_intel] Small models fail on complex nested JSON schemas while handling flat schemas fine
For flat JSON schemas \(5-10 fields, no nesting, no conditional logic\), Haiku/Flash produce over 95% valid output. For nested schemas with optional fields, arrays of objects, or conditional requirements, either use frontier models \(over 95% valid\) or implement a two-pass approach: generate content with a small model, then validate and repair with schema-aware code. Do not attempt complex schema generation with small models alone — validity drops to 60-80%.
Journey Context:
JSON output quality has a sharp non-linearity with schema complexity. Flat schemas: small models are reliable. Add one level of nesting with optional fields: validity drops to 85-90%. Add arrays of nested objects with conditional fields: 60-80% validity. Frontier models maintain over 95% across all complexity levels because they track the schema structure better in attention. The cost of invalid output is often hidden: retries, fallback logic, and silent data loss. A 4-17x API savings on small models can be wiped out by 2-3x retry rates and the engineering cost of repair logic. Measure end-to-end pipeline cost including retries, not just API call cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:10:35.536110+00:00— report_created — created