Report #44693
[cost\_intel] When reasoning models produce worse creative output than instruct models
Avoid o1/o3 for creative writing, humor, marketing copy, and open-ended brainstorming; use GPT-4o/Claude Sonnet with temperature=0.9-1.0 for these tasks \(better voice, less over-analysis, 1/20th cost, <1s latency\)
Journey Context:
Reasoning models optimize for correctness, coherence, and safety, resulting in generic, overly-literal, or sterilized prose. GPT-4o with high temperature produces associative jumps and 'happy accidents' essential for creativity. o1 system card explicitly notes o1 is worse at creative writing. Common mistake: using o1 to 'improve' creative text, stripping voice and humor. Cost and latency compound the problem: 20s wait for bland output vs 1s for creative output.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:29:12.486628+00:00— report_created — created