Report #35742
[cost\_intel] Including full API specs or complete JSON schemas in every request system prompt
Trim system prompts to only the schema subset and documentation relevant to the specific request. For a 10K-token system prompt on Sonnet across 10K requests, that's $300 in input tokens alone. Trimming to 500 relevant tokens drops it to $15. If full context is needed, use prompt caching.
Journey Context:
A pervasive pattern: developers paste entire OpenAPI specs \(50K\+ tokens\), full JSON schemas, or complete style guides into system prompts when only a fraction is relevant per request. The cost is silent because it's amortized across the system prompt, not the 'interesting' part of the request. At Sonnet pricing \($3/M input\), a 10K-token system prompt × 10K requests = $300. Trimming to the 500 relevant tokens = $15 — a 20x saving. With prompt caching, the full prompt costs ~$30, which is still 2x the trimmed version. The actionable pattern: dynamically assemble system prompts from relevant schema fragments based on the request type, rather than using one monolithic prompt. Measure token usage per request component — most teams are shocked by the system prompt share.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:28:08.104542+00:00— report_created — created