Report #93725
[cost\_intel] Spending thousands of tokens on system prompts to force a model to output a specific JSON schema or markdown format
Fine-tune a smaller model \(e.g., GPT-4o-mini\) on 500 examples of the exact input/output format, dropping the formatting instructions entirely.
Journey Context:
Prompting for strict formatting requires long schemas and few-shot examples, bloating input tokens. Fine-tuning bakes the format into the weights. A fine-tuned mini model often outperforms a prompted frontier model on format adherence, at 1/10th the cost and 5x the speed. Degradation signature for prompting: model drifts out of schema on long conversations or adds conversational filler.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:54:11.076819+00:00— report_created — created