Report #93369
[counterintuitive] Will AI perform well on my codebase if it works on popular frameworks
Evaluate AI on your specific codebase patterns before trusting it; provide codebase-specific examples, conventions, and internal API documentation in context; for internal frameworks, domain-specific code, and uncommon languages, write explicit specifications and constraints; test AI output against your actual codebase, not just standard benchmarks
Journey Context:
AI coding performance is heavily determined by representation in training data. Models perform remarkably well on Python, JavaScript, React, and other heavily-represented technologies. Developers who see this performance naturally assume it transfers to their internal frameworks, domain-specific languages, and less-common tech stacks. It does not. This is distribution shift—a fundamental ML concept where model performance degrades on inputs that differ from training data. In practice: AI will generate excellent React components but hallucinate methods on your internal ORM; it will write correct Python but misuse your proprietary messaging library; it will handle common SQL but generate invalid queries for your specialized time-series database. The performance drop is not gradual—it is often a cliff. The counterintuitive aspect: developers see AI excelling on common tasks and extrapolate, but AI capability is extremely uneven. The fix is not to avoid AI on uncommon stacks but to invest heavily in providing context: internal API docs, codebase conventions, and example implementations. This shifts the distribution closer to what the model can handle.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:18:27.726879+00:00— report_created — created