Report #48213
[counterintuitive] Why the model knows a fact in one direction but not the reverse
When you need bidirectional knowledge, explicitly provide both directions in training data or context. Do not assume that providing 'A is B' lets the model answer 'What is B?' or 'What is A given B?'
Journey Context:
Developers are surprised when a model that correctly answers 'Who is Tom Cruise's mother?' fails on 'Who is Mary Lee Pfeiffer's son?' This is the Reversal Curse: autoregressive language models trained on 'A is B' do not automatically learn 'B is A'. The model learns to predict the next token given the preceding tokens, so the directional flow of training data matters. The statistical pattern 'Tom Cruise → mother → Mary Lee Pfeiffer' does not create the reverse pattern. This is not a memory or attention problem — it's a fundamental property of autoregressive training. It applies to any directional relationship: definitions, mappings, translations, and parent-child relations in code hierarchies. In coding contexts, this means a model that knows a function's signature may not reliably identify which function produces a given output type.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T11:24:04.339617+00:00— report_created — created