Report #72304
[cost\_intel] When do reasoning models produce worse creative writing than cheap instruct models?
Do not use o1/o3 for brand copy, fiction, marketing voice, or any creative task requiring stylistic distinctiveness. Use Claude 3.5 Sonnet or GPT-4o with few-shot examples. Reasoning models default to 'academic essay voice'—overly structured, hedging, devoid of personality. They 'over-reason' creativity into generic mean-regression. Instruct models capture nuance with 1/10th cost and 3x lower latency.
Journey Context:
LMSYS Chatbot Arena human preference data shows Claude 3.5 Sonnet beating o1-preview on creative writing categories. The mechanism: reasoning optimizes for 'correctness' and 'safety' which correlates with boring, sanitized text. Creative writing requires high-entropy sampling from a specific distribution \(voice\), not optimization of a reasoning path. o1's hidden CoT is optimized for math logic, not narrative arcs. The cost delta \($15 vs $3 per mtok\) makes o1 a pure waste for blogs, ads, and dialogue generation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:56:54.864971+00:00— report_created — created