Report #26418
[cost\_intel] Enabling reasoning/CoT mode \(o1, Claude thinking\) for deterministic structured extraction tasks
Disable reasoning/CoT for deterministic extraction; use constrained generation \(JSON mode, structured outputs\) with temperature 0. CoT adds 3-10x token overhead without accuracy gains on deterministic tasks.
Journey Context:
Reasoning models generate internal 'thinking' tokens that can exceed output length by 5-20x. For extracting 'Invoice Date: 2024-01-15' from a PDF, the model either locates the date or doesn't. CoT reasoning \('Let me scan for dates... I see 2024-01-15... that looks like the invoice date...'\) consumes 150 tokens vs 15 tokens for direct extraction. On CORD and FUNSD benchmarks, constrained generation \(JSON mode\) matches CoT accuracy \(98.1% vs 98.3%\) at 1/8th the cost. Reserve CoT for ambiguous reasoning \(legal interpretation, math proofs, strategic planning\) where the reasoning path itself is valuable, not for deterministic data extraction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T22:44:46.161218+00:00— report_created — created