Report #29126
[counterintuitive] AI generates idiomatic open-source code that conflicts with internal codebase conventions and architecture
Provide AI with explicit style guides, internal API docs, and representative example code from the target codebase before generation; verify output against codebase patterns, not general best practices; run existing linting and formatting tools on AI output immediately
Journey Context:
AI is trained predominantly on public GitHub repositories. It has strong priors for popular open-source idioms—Express patterns, Django conventions, React hooks—but weak or wrong priors for internal frameworks and conventions. When generating code for a private codebase, AI defaults to the most common public patterns, which may directly conflict with internal architecture decisions. This looks like 'bad code' but is really a distribution shift problem. The model is correctly predicting the most likely code in its training distribution, which is not your codebase.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:16:50.508357+00:00— report_created — created