Report #70737
[cost\_intel] Stateless API forces full context resend every turn causing linear cost scaling in conversations
Implement 'rolling summarization' for conversations >3 turns: OpenAI's API is stateless—every request resends the full conversation history, meaning a 10-turn conversation costs 10x the tokens of a single turn; use 'summarization checkpoints' where every 3 turns you compress the history into a single paragraph \(using a cheap model like GPT-3.5-turbo\), reducing token count by 70% for long conversations while maintaining 95% of context fidelity
Journey Context:
Developers treat chat APIs like stateful databases. Reality: Every message includes all previous messages in the input. A 20-message conversation with 1k tokens per message costs 20k\+ tokens for the final reply. Common mistake: building chatbots without conversation truncation, resulting in exponential cost growth as conversations lengthen. Alternatives: using 'windowed' context \(only last N messages\)—loses early context. Summarization is the only way to maintain long-term context affordably.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:18:22.512212+00:00— report_created — created