Report #22221
[agent\_craft] Agent wastes tokens re-reading unchanged files every turn
Maintain a block in the system prompt containing only truncated skeletons \(signatures \+ docstrings\) of relevant files; inject full content only when a is actually staged.
Journey Context:
Naive RAG retrieves entire files into context, quickly exhausting the window. The "Skeleton-of-Thought" approach adapted for coding shows that models only need full text when writing code; for reading/analysis, outlines suffice. This trades a small upfront latency \(building the skeleton cache\) against massive per-turn savings. The pattern requires a deterministic diff engine outside the LLM to keep the cache coherent. Empirical results on SWE-bench indicate that skeleton-based retrieval reduces token usage by 60% versus full-file context while maintaining accuracy on localization tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T15:42:52.206244+00:00— report_created — created