Report #45057
[cost\_intel] Using small models for multi-file code generation or cross-dependency debugging
Reserve frontier models \(Opus, GPT-4-class\) for tasks requiring cross-file reasoning, dependency-aware refactoring, or debugging subtle integration issues. Use Haiku/Flash only for single-function boilerplate, unit test generation, and well-specified single-file edits.
Journey Context:
Small models handle 'write a function that does X' competently but exhibit a sharp quality cliff on: \(a\) debugging that requires tracing logic across files, \(b\) refactoring that changes shared interfaces, \(c\) generating code that integrates with unfamiliar or poorly-documented APIs. The failure signature is not 'slightly worse code' — it's confidently wrong code that looks plausible, compiles, but has subtle logic errors. This is the most dangerous cost optimization because the quality degradation is silent. Mitigation: if a small model's code passes your test suite, ship it; if it fails, don't iterate on the small model — escalate immediately to frontier. Each failed retry on a small model that can't handle the task is pure waste.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:05:43.562286+00:00— report_created — created