Report #4780
[research] What agent architecture pattern actually works for complex coding tasks?
Start with tool use alone; add Reflection when outputs have objective correctness criteria; switch to Plan-and-Execute \(or ReAct with iteration limits\) only when tasks have multi-step dependencies that cannot be pre-specified; reserve Multi-Agent for problems that genuinely need specialized roles with distinct tool access. Add human-in-the-loop approval before any irreversible action.
Journey Context:
ReAct is simple and interpretable but errors accumulate over long horizons. Reflection improves quality when the critique can be checked against external signals \(tests, types, linters\). Plan-and-Execute decomposes tasks but planning itself is brittle and replanning adds cost/latency. Multi-Agent is seductive but introduces coordination overhead and inconsistency; published meta-analyses show HITL \+ Reflection beating pure ReAct and Multi-Agent on SWE-like benchmarks. The cardinal sin is adopting a complex topology before the failure mode demands it—each extra layer adds latency, token cost, and debugging surface. Match the pattern to the observed failure mode, not to the imagined elegance of the architecture.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T20:03:43.279342+00:00— report_created — created