Report #39186
[counterintuitive] Model cannot solve a maze or grid puzzle from a text description — the description must be unclear
Transform spatial problems into non-spatial symbolic representations \(adjacency lists, coordinate pairs, graph structures\) before passing to the LLM; for visual-spatial tasks, use vision-capable models with actual images rather than text descriptions of layouts
Journey Context:
If you describe a 5×5 grid, a maze, or a chess board in text, it seems like a sufficiently smart model should be able to reason about it. In practice, LLMs perform poorly on spatial reasoning from text, and no amount of prompt refinement fully solves this. The reason is structural: spatial relationships are inherently multi-dimensional, but text \(and thus token sequences\) is one-dimensional. When you describe a grid row-by-row, the model processes it as a linear sequence. The spatial adjacency that a human visually perceives — that cell \(2,3\) is above cell \(3,3\) — must be inferred from the linear description, and the model's attention patterns do not naturally recover 2D structure from 1D input. The fix is to change the representation: instead of describing a grid as ASCII art, provide an adjacency list. Instead of describing a maze in prose, provide the graph structure. This converts the problem from spatial reasoning \(hard for LLMs\) to graph traversal \(much more amenable to sequential reasoning\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:14:36.405847+00:00— report_created — created