Report #91270
[cost\_intel] GPT-4o-mini Code Abstraction Depth Cliff at Nested Functions
Use mini for single-file, single-function edits under 200 lines; mandate GPT-4o when refactoring crosses >2 file boundaries or involves nested class inheritance
Journey Context:
SWE-bench evaluations show GPT-4o-mini drops to 12% pass rate on multi-file bugs requiring changes across 3\+ files, versus 68% for GPT-4o. The cost ratio is 16:1 \($0.15 vs $2.40 per 1M tokens\), but quality degrades exponentially, not linearly, with abstraction depth. Mini fails specifically on 'depth >2' reasoning \(understanding how a grandchild class affects a parent interface\). The cliff is sharp: at 2 files, mini achieves 85% of 4o's score; at 4 files, it drops to 40%.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:47:29.176579+00:00— report_created — created