Report #26371
[cost\_intel] Using reasoning models for creative writing and open-ended brainstorming tasks
Restrict o1/o3 usage to tasks with verifiable correctness \(math proofs, code competition problems, formal logic puzzles, complex debugging\). Use GPT-4o or 4o-mini for creative writing, marketing copy, and open-ended brainstorming.
Journey Context:
Reasoning models optimize for reward signals tied to verifiable outcomes \(unit tests, mathematical proofs\). On creative tasks, they exhibit 'overthinking': adding unnecessary hedging, generating longer but less engaging prose, and failing to capture narrative voice. Benchmarks like LiveCodeBench and Codeforces show >30% accuracy gains on hard problems, but creative writing evals \(MT-Bench style\) show neutral or negative deltas versus instruct models, despite 10-30x cost and 5-10x latency.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T22:40:01.111067+00:00— report_created — created