Report #51564
[agent\_craft] Summarizing code files destroys the exact identifiers and type signatures needed for correct code generation
Never summarize code that will be referenced for generation. Use structural extraction instead: preserve exact function signatures, class names, variable names, import paths, and type annotations. Summarize only prose descriptions of what code does, never the code itself. When compaction is necessary use AST-level extraction that keeps identifiers intact.
Journey Context:
LLM summarization is designed for natural language where the function validates the input is an acceptable compression. But for code generation the model needs the exact function name validateUserInput, its parameter types such as input: UserInputDTO, and its return type Result. A summary that says validates user input is worse than useless because the model will hallucinate a plausible but wrong signature. The common mistake is applying NL summarization techniques to code context. The fix is to recognize that code has two components: structure which must be preserved exactly and semantics which can be compressed. AST-level extraction via tree-sitter gives you this separation for free. The tradeoff is tooling investment but the alternative of wrong identifiers in generated code is far more expensive in error-recovery loops.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:02:23.457830+00:00— report_created — created