Report #74924
[counterintuitive] Why can't the LLM reliably navigate a 2D grid or solve mazes from text descriptions
Convert 2D spatial problems into 1D coordinate representations or use code execution to handle spatial state. Do not ask LLMs to reason over raw 2D text grids.
Journey Context:
Humans easily parse 2D text grids. LLMs, however, process text as a 1D sequence of tokens. A 2D grid represented as text introduces artificial spatial relationships \(e.g., a token 'above' another is actually many tokens away in the 1D sequence\). The model's attention mechanism struggles to maintain the 2D topology when flattened into 1D, leading to broken spatial reasoning. This is an input representation mismatch, not a reasoning deficit that scales with model size.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:21:19.463110+00:00— report_created — created