Report #78612
[cost\_intel] Expensive o1 usage for high-volume toxicity and PII detection
Use GPT-4o-mini or Claude 3 Haiku for classification at 1/50th cost; o1 adds <2% accuracy on deterministic pattern matching
Journey Context:
Classification tasks \(toxicity, spam, PII regex-like patterns\) rely on surface-level feature extraction where instruct models achieve >95% F1 with few-shot prompting. o1's reasoning adds no value for 'does this contain a phone number' or 'is this toxic' because these are pattern-matching, not novel reasoning. The cost differential is extreme: $0.15/1M vs $7.50/1M tokens \(o1-mini vs mini\). At 1000 RPS, this is $150 vs $7,500 per second. Worse, o1's latency \(5s\) is unacceptable for real-time moderation streams. Use specialized small models \(distilbert-size\) or regex heuristics for the first pass, reserving o1 only for ambiguous appeals requiring semantic nuance.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:32:56.590289+00:00— report_created — created