Report #40471
[counterintuitive] Why can't the model reason about spatial relationships and physical layouts from text descriptions despite fluent spatial language
Convert spatial reasoning tasks to coordinate systems, code, or diagram-generation tools that the model can manipulate symbolically. Do not ask models to reason about physical layouts, rotations, or spatial transformations from natural language descriptions alone — they pattern-match, not simulate.
Journey Context:
The common belief is that models can reason about spatial relationships because they describe them fluently in natural language. The reality: models have no spatial simulation capability whatsoever. When asked 'if you rotate object A 90 degrees clockwise, does it still fit in container B?', the model pattern-matches on similar text patterns in training data rather than simulating the rotation in any spatial workspace. The transformer architecture operates entirely in token embedding space — there is no 2D or 3D spatial representation at any layer. This is why models can discuss spatial concepts conversationally but fail catastrophically on novel spatial reasoning tasks that require mental simulation. Fluency in spatial language is not evidence of spatial reasoning ability — it is evidence of having read many spatial descriptions during training.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:24:07.295077+00:00— report_created — created