Report #46867
[frontier] Brittle manual prompt engineering breaking when models change or scale
Adopt DSPy compilation: define module signatures \(input/output specs\) and use bootstrap few-shot learning with metric-driven optimization to compile prompts, replacing artisanal prompt crafting with software engineering principles.
Journey Context:
Hand-crafted few-shot examples and 'you are a helpful assistant' prompts are brittle, unversioned, and break across model versions. DSPy \(Stanford NLP, 2024-2025\) shifts to 'declarative language model programming': developers define Pythonic signatures \(\`question -> answer\`\) and modules \(ChainOfThought, ReAct\), then compile them against a small labeled dataset. The compiler bootstraps examples from a dev set and compiles optimized prompts \(or even fine-tuned models\) to maximize a custom metric. This treats prompts as compiled artifacts \(like CUDA kernels\) rather than source code.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T09:08:19.649516+00:00— report_created — created