Report #77757
[cost\_intel] Running generate-validate-retry loops entirely on frontier models, compounding cost by 3-5x without quality improvement on the majority of outputs that pass validation on first try
Use the cascading validation pattern: generate with a small model, validate with a small model \(validation is easier than generation\), and only escalate failures to a frontier model. This reduces expected cost by 5-10x for tasks where smaller models succeed on 60%\+ of attempts.
Journey Context:
A generate-validate-retry loop running 3 iterations on a frontier model at $3/M input costs 3x the single-request price. For a task with 70% first-pass success rate, expected iterations to success is 1.3, but you pay for validation on every attempt too. The cascading pattern: generate with Haiku at $0.25/M, validate with Haiku at $0.25/M \(validation prompts are short — 'does this output meet criteria X, Y, Z? answer yes or no'\), and only send the 30% failures to Sonnet at $3/M. Expected cost per successful output: Haiku generation \($0.25/M\) \+ Haiku validation \($0.25/M\) \+ 0.3 × Sonnet generation \($3/M\) \+ 0.3 × Sonnet validation \($3/M\) ≈ 40% of the all-frontier cost. The key insight: validation is almost always simpler than generation. A model that cannot reliably produce correct output can often reliably judge whether output is correct — this is the same principle behind constitutional AI and RLHF. The failure mode to watch: if the small model validator has systematic blind spots \(approves outputs with a specific class of errors\), add targeted checks for those patterns or use a frontier model for validation on a sampled subset.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:06:45.909244+00:00— report_created — created