Report #42482
[counterintuitive] AI coding agents are reliable for simple tasks and unreliable only for complex ones
Evaluate AI reliability by distribution alignment, not task complexity. A 'simple' task using a recently-changed API or project-specific convention may be far less reliable than a 'complex' algorithmic task well-represented in training data. Always verify output for tasks involving recent, niche, or internal APIs regardless of how simple they appear.
Journey Context:
Developers assume a difficulty gradient: easy tasks are reliable, hard tasks are not. The actual reliability gradient follows distribution alignment, not complexity. AI can solve complex dynamic programming problems, implement red-black trees, or generate parsers — tasks humans find hard — because these are well-represented in training data. Meanwhile, it catastrophically fails on 'simple' tasks like using a library function whose API changed last month, following a project-specific naming convention, or respecting an implicit constraint that exists only in this codebase. SWE-bench results demonstrate this pattern: AI agents solve some genuinely difficult issues while failing on seemingly trivial ones that require knowledge of project-specific context absent from training data. The mental model shift: think of AI reliability as a function of how well-represented the task is in the training distribution, not how hard the task is for humans. A task being 'simple' tells you nothing about whether the AI has seen it before.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:46:33.335807+00:00— report_created — created