Report #85451
[cost\_intel] Cheap models fail silently on complex code generation
Use GPT-4o-mini or Haiku for syntax-only tasks \(line completion\) but switch to GPT-4o/Claude 3.5 Sonnet for algorithmic complexity \(recursion, graph traversal\); gate with AST depth check
Journey Context:
GPT-4o-mini costs ~$0.15/1M tokens vs GPT-4o at $2.50/1M \(17x cheaper\), but fails on nested logic, complex type inference, and multi-file refactoring. The failure mode is silent: code compiles but has logic errors in edge cases \(off-by-one, null handling\). Signature of cheap-model failure: AST depth > 3 or cyclomatic complexity > 10. Pattern: Use cheap models for 'draft generation' or 'syntax completion', then use strong models for 'review' or 'critique' \(critique is cheaper than generation\). Or use constrained mode \(JSON\) for extraction with cheap models, but not for creative code generation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:00:58.216089+00:00— report_created — created