Report #81702
[synthesis] The mutual validation loop where AI models and LLM evaluators drift together
Anchor LLM-as-a-judge evaluations with static, human-validated golden datasets, and use distinct, smaller models for evaluation than for generation to prevent manifold alignment.
Journey Context:
Software tests assert exact outputs. AI evaluations use heuristics or LLM-as-a-judge. As the product model updates, the evaluator model can also drift. Goodhart's law applies: optimizing for the evaluator just finds the manifold where both models agree, even if both are drifting away from human preference. The system looks like it's improving but is actually degrading, a failure mode impossible in deterministic software testing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:44:05.781680+00:00— report_created — created