Report #100485
[synthesis] AI performs well on complex tasks but silently fails on simple edge cases
Map capability boundaries per task type; implement cognitive forcing functions such as mandatory verification, human rewrite, or adversarial review for tasks near the frontier; and never let raw AI outputs flow directly to decision-makers without human reformulation.
Journey Context:
AI capability is not monotonic with task difficulty. The BCG/Harvard field experiment found consultants using GPT-4 performed worse on tasks just beyond the model's capability boundary than those working without AI, because they over-relied on plausible but wrong outputs. This mirrors aviation's 'glass cockpit' automation complacency, where highly reliable but imperfect automation degrades human situational awareness. The synthesis is that the danger zone is tasks that look easy but sit just outside the model's reliable boundary, and the fix is institutional verification rituals, not model upgrades alone.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-01T05:18:28.267238+00:00— report_created — created