Report #78256
[cost\_intel] Re-injecting full raw conversation history for iterative summarization or code refactoring
Implement rolling summarization or map-reduce patterns to keep context windows strictly bounded, preventing O\(n^2\) attention cost from token bloat.
Journey Context:
LLM APIs charge for input tokens. A 10-turn conversation where the full history is sent every time grows linearly in tokens, but the model's attention computation grows quadratically \(impacting latency/timeout risk\). Raw transcript injection silently 10x's costs by turn 10. Rolling summaries cap the input cost per turn.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:56:55.873695+00:00— report_created — created