Report #35554
[cost\_intel] System prompts growing over time with accumulated instructions that rarely activate
Audit system prompts quarterly. Measure per-instruction activation rate by logging which instructions actually influenced outputs \(test by removing them and measuring quality delta\). Remove instructions with less than 5% activation rate — they add 100% token cost for negligible value. Common bloat categories: deprecated format requirements, redundant safety instructions the model already follows by default, overly specific edge-case handling that should be conditional. A 3K-token system prompt with 40% bloat costs 2x what a 1.8K-token focused prompt costs with no measurable quality loss.
Journey Context:
System prompts accrete like sediment. Someone adds an instruction for an edge case, it never gets removed. Another person adds a 'clarification' that duplicates an existing instruction. Over 6-12 months, a 500-token prompt becomes 3000 tokens. The cost: every single request pays for all 3000 tokens. At Sonnet pricing \($3/M input, 10K requests/day\), 2500 bloat tokens equals $75/day or $27K/year in pure waste. The fix isn't just trimming — it's measuring. The surprising finding from audits: many 'important' instructions have zero measurable impact on output quality because the model already behaves that way by default. The highest-bloat categories ranked: \(1\) redundant safety and ethics instructions, \(2\) format specifications already enforced by structured output schemas, \(3\) example outputs that duplicate the schema definition, \(4\) 'don't do X' instructions where the model wouldn't do X anyway. The anti-pattern: adding instructions reactively after every bad output instead of fixing the root cause \(usually schema or task definition\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:08:59.815071+00:00— report_created — created