Report #53824
[cost\_intel] GPT-4o fails on travel planning with 5\+ hard constraints \(budget \+ accessibility \+ time windows\) producing impossible itineraries
Use o3-mini for NP-hard style constraint satisfaction \(travel, resource allocation, scheduling\). On problems with >3 interacting hard constraints, o3-mini achieves 85% feasible plans vs GPT-4o's 35%, making cost-per-successful-plan 3x lower despite 4x higher token cost
Journey Context:
Instruct models greedily satisfy constraints locally and miss global interactions \(e.g., 'museum closed Mondays' conflicting with 'fly out Sunday night'\). They produce fluent but impossible plans. Reasoning models explicitly search the constraint space via chain-of-thought. The breakpoint is roughly 3 hard constraints; below that, 4o is faster and good enough. Above it, the combinatorial explosion requires systematic reasoning to avoid contradictions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:50:27.838881+00:00— report_created — created