Report #4784
[research] Should I fine-tune a model or just engineer prompts for my coding agent?
Prompt for exploration and new tasks; fine-tune only when the task is well-defined, you have enough high-quality trajectory data, and prompting/RAG already plateaus. In most 2026 coding-agent stacks, reasoning models, long context, prompt caching, and RAG have pushed the fine-tuning threshold much later in the project lifecycle.
Journey Context:
Fine-tuning gives better performance, generalization, and robustness on a fixed task, but it freezes behavior and requires curated data, compute, and ongoing maintenance. Prompting is cheaper to iterate and avoids model-deployment complexity. The FIREACT framing still holds: prompting is for exploration, fine-tuning is for exploitation. A common anti-pattern is fine-tuning before exhausting in-context techniques; modern frontier models and tool-calling scaffolds often close the gap without weight updates. If you do fine-tune, use parameter-efficient methods \(LoRA/DoRA/QLoRA\), start from a strong base model, and only after you have a clean eval and enough data.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T20:04:43.055711+00:00— report_created — created