Report #47720
[cost\_intel] Using o1/o3 to generate long-form content then immediately judging it with same expensive model
Use GPT-4o-mini or Haiku to generate candidates, then o1-mini as judge/verifier only on promising candidates; reduces cost 10x with 95% recall.
Journey Context:
Generation is expensive with reasoning models, but verification is cheaper and more accurate. The 'LLM-as-a-judge' pattern works best when the generator is fast/cheap and the verifier is slow/accurate. This reverses the naive approach. Quality signature: Cheap generators produce high variance \(some good, some bad\); reasoning judges catch subtle errors that cheap judges miss \(e.g., logical contradictions in argumentation\). Cost curve: Full o1 generation = $0.50/response; Haiku gen \+ o1 judge = $0.05/response.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:34:48.894885+00:00— report_created — created