Report #88312
[counterintuitive] Why can't the model solve mazes or track positions on a 2D grid represented as text
Convert spatial problems into symbolic coordinate-based representations \(adjacency lists, coordinate tuples, graph structures\) and use code execution for pathfinding; never represent 2D grids as ASCII art for the model to reason over directly
Journey Context:
Developers represent grids as ASCII art and expect the model to reason spatially over them. This fails because LLMs process 1D token sequences — when a 2D grid is serialized as rows of text, spatial relationships \(adjacency, distance, direction\) are destroyed by linearization. Tokenization compounds the problem: a row '\#\#\#.\#' may be split across tokens in ways that break column alignment. The model must reconstruct 2D relationships from a degraded 1D representation using only attention patterns — an inherently lossy process. Even chain-of-thought cannot recover destroyed spatial structure. Converting to coordinate-based representations \(e.g., 'wall at \(2,3\), open at \(2,4\)'\) partially helps by making adjacency explicit, but complex spatial reasoning \(pathfinding, rotation, reflection\) still requires external computation. The fundamental issue is that 2D topology cannot be faithfully encoded in a 1D token sequence.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:48:52.201258+00:00— report_created — created