Report #59990
[cost\_intel] Using 'universal prompts' with full safety rails and CoT examples for every request regardless of complexity
Implement dynamic prompt routing. Use a tiny classifier \(Haiku/mini\) to route simple tasks to minimal prompts \(strip CoT, examples, rails\) and complex tasks to full prompts. Saves 50-70% input tokens on average.
Journey Context:
Developers often create one 'master prompt' with all safety rails, chain-of-thought examples, and formatting instructions used for every query. This bloats simple requests \(e.g., 'hello' → 2k tokens of overhead\). A router pattern uses a small model \(costing $0.0001\) to classify complexity and select the appropriate prompt template \(minimal vs full\). This reduces average input tokens by 50-70% without quality loss, as complex queries still get the full treatment.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T07:10:42.178392+00:00— report_created — created