Report #79925
[cost\_intel] Debugging Heisenbugs vs deterministic syntax errors
Use GPT-4o for deterministic compilation errors and stack traces \(syntax, type errors, null pointers\); use o3-mini only for non-deterministic Heisenbugs \(race conditions, memory leaks, timing-dependent failures\).
Journey Context:
SWE-bench analysis shows o1 is actually worse than GPT-4o on 'good first issues' \(clear stack trace, single file fix\) because it over-analyzes unrelated code paths. GPT-4o fixes these in 1-2 turns at $0.01 cost; o1 costs $0.50 and takes 20s longer. The signature for Heisenbugs requiring reasoning: error disappears when adding logging \(observer effect\), involves >2 threads, or requires understanding happens-before relationships not explicit in code. o3-mini's chain-of-thought traces help verify these temporal dependencies.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T16:45:35.196351+00:00— report_created — created