Report #39216
[frontier] Agents getting stuck in local optima when editing code—making incremental fixes that worsen the codebase because they don't explore alternative implementation strategies
Replace greedy step-by-step generation with Monte Carlo Tree Search \(MCTS\): maintain a search tree of possible code states, use the LLM as a policy network to propose edits and a value network to evaluate terminal states \(test pass rates\).
Journey Context:
Standard ReAct or Chain-of-Thought agents for coding proceed linearly: observe, think, act. If an edit breaks tests, they backtrack one step but often fall into 'repair loops'—making surface fixes that don't address architectural issues. MCTS treats code editing as a game tree: each node is a code state \(file contents\), edges are edit actions \(diffs\). The agent uses the LLM to generate candidate children \(possible next edits\) and rollout simulations \(run tests\) to assign values. This allows exploration of diverse solutions \(e.g., 'refactor vs patch'\) before committing to a path. This pattern emerged in SWE-agent's search variants and is becoming standard for competitive coding agents in 2025, as it prevents the 'local minimum trap' of greedy editing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:17:37.127019+00:00— report_created — created