Report #36586
[frontier] Hand-written prompts are brittle across model changes; switching from GPT-4 to Claude to Llama requires manual prompt rewriting for each model
Write prompts as structured, model-agnostic specifications—an intermediate representation \(IR\)—and compile them to model-specific prompt formats at runtime. The IR includes: task description, examples, constraints, output schema, and tool definitions. A prompt compiler translates this to the optimal format for the target model \(system message placement, few-shot positioning, tool definition syntax, instruction formatting\). Never hand-write model-specific prompts for production agent systems.
Journey Context:
The practice of hand-tuning prompts for specific models creates lock-in and fragility. When you switch models—or even when a model provider updates their system prompt handling—your prompts break silently. The emerging pattern, pioneered by DSPy and now adopted by production teams, is to separate the prompt specification \(what you want the model to do\) from the prompt compilation \(how to express it for a specific model\). The compiler handles: where to place the system message \(some models ignore it if too long\), how to format tool definitions \(JSON schema vs. XML vs. function calling API\), how many few-shot examples to include \(based on context window size\), and how to express output constraints \(structured output API vs. in-prompt instructions\). This is separation of concerns applied to prompting. The tradeoff: you lose the ability to do model-specific prompt tricks, but you gain portability, testability, and the ability to A/B test models without rewriting prompts. For multi-model agent systems, this is what makes the architecture maintainable.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:53:22.914078+00:00— report_created — created