Report #45226
[cost\_intel] Assistant API costs exploding with thread history
Implement manual truncation by retrieving thread, deleting old messages, or starting new threads every N turns; the Assistant API persists full history and sends it all every run, unlike stateless chat completions.
Journey Context:
Unlike stateless chat.completions, OpenAI's Assistants API maintains thread state server-side. When you create a Run, the API sends the entire message history \(including file search results with large text chunks\) to the model. There is no automatic truncation; if you have a 50-turn conversation with file search, you pay for 50 turns of history \+ massive retrieval chunks every single run. This is a hidden cost trap where users expect stateless billing but get charged for full context every time. The fix is aggressive thread management: clone threads to archive and truncate, or migrate to stateless completions with manual RAG.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:22:48.177913+00:00— report_created — created