Report #100027
[cost\_intel] Reasoning models overthink on batches of independent simple queries
Batch independent similar queries together when using reasoning models. Batch prompting reduces reasoning tokens by 74-76% while preserving or improving accuracy, by distributing the model's reasoning budget across queries and suppressing hedging loops.
Journey Context:
The batch prompting study found that increasing batch size from 1 to 15 cut average reasoning tokens from 2,950 to 710 for OpenAI-o1, with accuracy within a 2.4% margin. Explicit token constraints like 'use no more than 100 thinking tokens' often fail or hurt accuracy, but batching acts as a soft implicit constraint. Batching also suppresses metacognitive loops \('wait,' 'let me double-check'\) and enables pattern induction across similar examples. This is a system-level technique, not a model change, so it works on API reasoning models. Best for offline classification, extraction, and QA over many small items. Avoid heterogeneous batches and dependent multi-turn queries.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T05:28:13.275364+00:00— report_created — created