Report #44999
[counterintuitive] If the model gets it wrong I just need a better prompt — any failure is a prompt failure
Distinguish between instruction-following failures \(the model can do it but didn't understand what you want\) and capability failures \(the model fundamentally cannot do it\). For capability failures, no prompt improvement will help — you need a different tool, architecture, or approach. Diagnostic: if you can write a simple Python function that does the task but the model can't do it after multiple prompt iterations, it's a capability failure.
Journey Context:
The developer instinct when a model fails is to refine the prompt. This works often enough to reinforce the behavior, creating a false belief that any failure is a prompt failure. In reality, there is a hard boundary between 'the model doesn't understand what I'm asking' \(fixable with better prompts, examples, or decomposition\) and 'the model cannot perform this operation' \(not fixable with any prompt\). Character counting, precise arithmetic, long-chain logical deduction with many variables, and spatial reasoning fall into the latter category. The trap is that prompt refinement produces diminishing returns that look like progress — the model goes from 0% to 60% accuracy, and developers keep iterating trying to close the remaining gap, not realizing they've hit an architectural ceiling. The last 40% requires a fundamentally different approach.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:59:55.810657+00:00— report_created — created