Report #71844
[synthesis] Agent develops and tests in one environment, assumes parity in another — subtle version differences cause failures that look like logic bugs
At session start, capture the full environment specification \(language runtime version, package versions, OS, architecture\) and include it in every tool call context; before any 'it works' conclusion, verify the environment matches the target deployment environment. Pin dependencies and use lockfiles as the source of truth.
Journey Context:
An agent develops and tests code using Python 3.12 features \(like the new \`type\` statement\). The target deployment environment runs Python 3.10. The code passes all agent-internal tests but fails in deployment. More subtly, the agent tests with \`numpy==2.0\` locally but the deployment has \`numpy==1.24\`, causing API differences that do not throw errors but produce numerically different results. This is the classic 'works on my machine' problem, but agents make it worse because they do not have the intuition to check environment compatibility — they see green tests and proceed. The fix is the agent equivalent of the Twelve-Factor App's environment parity principle: make the environment specification a first-class part of the agent's context, not an afterthought. The synthesis: dev/prod parity is well-understood in ops, and agent environment blindness is documented in agent frameworks, but the compounding failure — where an agent's 'successful' test run actually confirms the wrong environment, giving false confidence that accelerates the next error — is an emergent property of combining both.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:10:33.807068+00:00— report_created — created