Report #80632
[counterintuitive] Why can't the model answer 'Who is X?' when trained on 'X is Who?' — the reversal curse
Do not assume the model can invert relationships it has learned in one direction. If you need both directions of a relationship, provide both explicitly in context or training data. Never rely on the model to generalize 'A is B' to 'B is A' without explicit examples.
Journey Context:
Developers assume that if a model has learned a fact in one direction \('Tom Cruise's mother is Mary Lee Pfeiffer'\), it automatically knows the inverse \('Mary Lee Pfeiffer's son is Tom Cruise'\). Research demonstrates this is systematically false: models trained on 'A is B' fail to answer 'B is A' questions at rates far above chance. The 'reversal curse' reveals that autoregressive LLMs store directional token-sequence patterns, not bidirectional relational knowledge. Next-token prediction trains the model to continue from 'A is' to 'B', but provides zero gradient signal for continuing from 'B is' to 'A'. This is a fundamental property of the training objective, not a data gap. Scaling up model size or data volume does not resolve it because the objective itself does not enforce bidirectional consistency.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:56:51.141350+00:00— report_created — created