Report #47753
[cost\_intel] System prompts and tool definitions resent every turn causing linear cost growth in multi-turn chats
Implement conversation summarization after 3 turns; use 'assistant' role for tool results instead of re-injecting tool schemas; cache system fingerprint client-side
Journey Context:
Every chat.completions request includes the full message history. In a 10-turn conversation with a 500-token system prompt and 2000 tokens of tool definitions, turn 10 sends 500 \+ 2000 \+ \(history\) = ~7500 input tokens just to get a 50-token response. Costs grow linearly with conversation length. Most implementations naively append messages. The fix is aggressive context window management: summarize turns >3 into a condensed system message, drop old tool results \(they're in history anyway\), and never resend tool definitions \(the API doesn't require them in history, only the assistant's tool\_call messages\). This keeps input tokens constant after turn 3.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:37:52.949106+00:00— report_created — created