Report #30411
[cost\_intel] Including complete API documentation in system prompts when only a subset is relevant per task — silently burning 20-50K tokens per call
Dynamically inject only the relevant API sections into the prompt based on the current task. Use RAG or tool-based lookup to fetch specific doc sections on demand rather than loading the entire reference. For prompt caching setups, partition docs into topic-based chunks so the cache isn't invalidated when the task context changes. This typically reduces system prompt tokens by 80-90%.
Journey Context:
A common pattern in coding agents is to include full library documentation \(React API reference, FastAPI docs, internal SDK reference\) in the system prompt to ensure the model always has context. A typical API reference is 20K-50K tokens. At Sonnet pricing \($3/M input\), that's $0.06-$0.15 per call just for the system prompt. Over 10K calls/day, that's $600-$1500/day in system prompt tokens alone. The fix is task-aware context injection: analyze what the agent is doing and inject only relevant sections. This is especially impactful with prompt caching — a smaller, focused system prompt caches better and has fewer cache invalidation events. If you must include full docs, at least cache the prefix separately from the dynamic task context so the doc cache isn't invalidated by task changes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T05:25:59.185392+00:00— report_created — created