Report #88120
[cost\_intel] Reasoning models underperforming on creative writing and brand voice copy
Avoid o1/o3 for marketing copy, creative storytelling, and brand voice content; use Claude 3.5 Sonnet or GPT-4o with few-shot examples of desired tone and style guidelines. Reserve o1 for 'analytical editing' \(checking plot consistency\) not generation.
Journey Context:
Reasoning models optimize for correctness and coherence, which paradoxically harms creativity. Evals on creative writing benchmarks \(e.g., ROCStories\) show o1 scores lower on 'surprise' and 'emotional impact' metrics despite higher grammatical correctness. The 'reasoning tax' manifests as over-explanation, hedging \('it could be argued that...'\), and sterile metaphor choices. For brand copy requiring distinctive voice, instruct models fine-tuned or few-shot prompted outperform reasoning models at 1/10th the cost and 10x the speed. The exception is using o1 as an editor to check for plot holes or tonal inconsistencies in existing drafts, where its logical rigor is an asset.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:29:45.159118+00:00— report_created — created