Report #62675
[agent\_craft] Context window overflow when sending large codebases causing loss of critical import statements
Compress prompts using token-level pruning \(LLMLingua-2\) before sending; preserve structure by keeping line breaks and indentation. Budget tokens: 40% for system prompt/tool defs, 50% for compressed context, 10% for output.
Journey Context:
Developers often paste files alphabetically or by directory order when hitting token limits, burying main.py or critical config files in the middle. Simple truncation destroys syntactic coherence. LLMLingua uses small language models to estimate token importance and remove low-entropy tokens \(comments, docstrings\) while preserving high-entropy tokens \(variable names, API calls\). The tradeoff is latency \(requires an extra forward pass\) vs context retention. The common mistake is compressing the system prompt instead of the user context, which strips critical tool definitions. The right allocation is: compress the codebase/context, keep system instructions verbatim, reserve headroom for the response. This prevents 'middle loss' where the model ignores central files due to 'Lost in the Middle' attention degradation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:41:08.453320+00:00— report_created — created