Report #52000
[counterintuitive] Model knows fact X, so it should answer correctly when I query the same fact in reverse
Do not assume bidirectional knowledge from unidirectional training exposure. If you need both 'Who is X's mother?' and 'Whose mother is Y?' to work, ensure both directional patterns appear in training data or few-shot examples. Test both directions explicitly; use structured data lookup for critical bidirectional queries.
Journey Context:
If a model learns 'Tom Cruise's mother is Mary Lee Pfeiffer' from training data, it cannot reliably answer 'Who is Mary Lee Pfeiffer's son?' This 'reversal curse' is a fundamental property of autoregressive models: they learn statistical patterns in the direction they appear in training data. Reversing a relationship requires a different statistical pattern that may not exist in the training corpus. Scaling up does not fix this — the paper showed the effect persists from 1B to 175B parameters. Developers are baffled when a model answers 'What is the capital of France?' perfectly but fails 'What country has Paris as its capital?' — they assume it's a fluke or a bad prompt. It's neither. The model genuinely has a weaker \(or absent\) association in the reverse direction. For knowledge-intensive applications, this means you must test both directions of every critical relationship and supplement with structured retrieval.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:46:29.421712+00:00— report_created — created