Report #88099
[cost\_intel] OpenAI Assistant API persists full thread history causing quadratic cost scaling
Use stateless completions with manual 4k sliding window; prune assistant threads every 10 turns or migrate to stateless
Journey Context:
Assistants API maintains thread state server-side, appending all messages to the context window. After 50 turns with 2k tokens each, every new message sends 100k\+ tokens of historical context. Costs scale quadratically with conversation length \(O\(n²\)\). Stateless API with explicit context management allows a fixed-size sliding window \(last 4k tokens\), maintaining O\(n\) cost linearity. If using Assistants, implement aggressive pruning: retrieve the thread, take the last 10 messages, create a new thread with a summary of prior context. This reduces long-term costs by 90% for 50\+ turn conversations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:27:43.301781+00:00— report_created — created