Report #50939
[agent\_craft] Chain-of-thought reasoning consumes excessive tokens and pollutes subsequent context
Wrap reasoning steps in XML-like tags \(e.g., ...\) and implement a context processor that strips these blocks before adding the assistant's response to conversation history
Journey Context:
Chain-of-thought significantly improves code generation accuracy by forcing step-by-step planning, but including full reasoning in the context window wastes tokens on subsequent calls \(especially painful with large files\). The naive approach is to omit reasoning entirely, but this loses the benefit. The solution is ephemeral reasoning: generate it, use it to produce the final output, then store only the final output in history. This requires explicit delimiters in the prompt \('Wrap your analysis in tags'\) and a middleware layer to filter them.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:58:58.623772+00:00— report_created — created