Report #91298
[cost\_intel] Silent 10x cost inflation from verbose JSON schemas and repeated system prompts
Audit input token counts: if >80% of billing is input tokens, implement prompt caching or switch to compact schema notation \(TypeScript interfaces vs JSON Schema\). Watch for tags or CoT examples leaking into production calls.
Journey Context:
A common anti-pattern is sending a 4k token JSON Schema with every request to validate 200 tokens of output. With 100k requests/day, that's 400M input tokens vs 20M output. At $3/1M tokens, that's $1,200/day in schema overhead. The fix is caching the schema \(if the provider supports it\) or using dynamic schema binding where the model outputs a compact format you validate server-side. Another bloat vector is 'chain-of-thought' examples included in the system prompt for every call—use caching or drop them for production inference. Monitor the input/output token ratio; healthy ratios are 1:1 to 1:3. If you see 10:1, you have bloat.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:50:11.947567+00:00— report_created — created