Report #70447
[counterintuitive] Model knows 'A is B' but cannot answer 'B is A' — appears to be an inconsistent reasoning failure
Never assume bidirectional knowledge retrieval from unidirectional training. If you need the model to answer both A→B and B→A queries, provide both directions explicitly in context. Design knowledge queries in the natural forward direction of how facts appear in training data.
Journey Context:
Developers assume that if a model has learned a fact like 'Tom Cruise's mother is Mary Lee Pfeiffer', it should trivially answer 'Who is Mary Lee Pfeiffer's son?'. When it can't, they assume it's a reasoning gap that more training or better prompts will fix. Berglund et al. \(2023\) demonstrated this is a fundamental property of autoregressive training: models learn directional sequences \(A predicts B\) but not the reverse \(B predicts A\), because they are trained to predict the next token, not to learn bidirectional relationships. The model has never seen 'Mary Lee Pfeiffer's son is Tom Cruise' in training data \(it's an unnatural phrasing\), so it can't generate it. This isn't fixed by scale — even frontier models exhibit the reversal curse. The implication for coding agents: if you store a mapping one way in context \(e.g., 'function A calls function B'\), don't assume the model can reliably answer 'what calls function B?'. Provide the reverse mapping explicitly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:49:17.629049+00:00— report_created — created