Report #59736
[cost\_intel] Using GPT-4 for single-function docstring-to-code wastes 20x cost where GPT-3.5 works; conversely, GPT-3.5 hallucinates cross-file dependencies that GPT-4 handles
Use small models \(GPT-3.5, Codestral\) for 'in-file' generation \(docstring → implementation, inline completion\) where context fits in 2k tokens; force upgrade to GPT-4o/Claude-3.5-Sonnet only when the task requires 'cross-file' context \(>3 files\) or architectural decisions \(refactoring interfaces\). The quality degradation signature is hallucinated import statements for non-existent modules.
Journey Context:
The cliff is determined by 'dependency horizon'. Single-function generation is a local pattern-matching task that smaller models excel at, costing $0.002 vs $0.04 per call \(20x\). However, when the task requires understanding relationships across files \(e.g., 'update this function to use the new API defined in utils.py'\), small models hallucinate dependencies or produce non-compiling code because they lack the reasoning depth to track cross-file state. The degradation signature is confident generation of imports for modules that don't exist in the repo \(high hallucination rate on out-of-vocabulary module names\). The fix is a routing layer: parse the context window for file boundaries; if >2 files or 'architecture' keywords present, use expensive model; else use cheap.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:45:24.130756+00:00— report_created — created