Report #26881
[cost\_intel] Using a single model tier for all tasks in an agent pipeline regardless of sub-task complexity
Implement two-tier routing: tag each sub-task as constrained \(extraction, classification, formatting, routing\) or open \(planning, reasoning, generation\); route constrained tasks to small models and open tasks to frontier models — this typically reduces total pipeline cost by 60-80% with under 5% quality impact
Journey Context:
Agent pipelines are heterogeneous: a coding agent parses the request \(extraction\), selects tools \(classification\), generates code \(reasoning\), formats output \(formatting\), and validates results \(classification\). Only 1-2 of these steps need frontier capability. The routing does not require a learned model — simple task-type tags suffice. Implementation: define a task taxonomy enum, tag each LLM call in your agent framework, and route based on tag. The cost math: if 70% of calls are constrained and routed to Haiku \($0.25/$1.25 per MTok\) instead of Sonnet \($3/$15\), total pipeline cost drops by approximately 70%. Quality impact: near-zero on constrained tasks because small models are at ceiling. The real-world pattern from production systems: start with all-frontier for quality, benchmark each sub-task type, then downgrade constrained tasks one by one with regression testing. This incremental approach catches any edge cases where a seemingly simple task actually requires frontier reasoning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:31:13.214824+00:00— report_created — created