Report #98170
[synthesis] Prompt changes ship as untested behavioral code
Treat prompts as versioned, reviewable artifacts: store them in a registry, run eval suites on every change, require review, and promote through staging and canary. Load prompts dynamically so experiments don't require code deploys, but lifecycle gates still apply.
Journey Context:
A prompt change can alter behavior as much as a model swap, yet teams often edit prompts in a UI or hardcode strings without tests, review, or rollback. The anti-pattern is 'prompt engineering as vibes' rather than engineering. The fix is to load prompts from a registry, version them alongside models, and run the same eval and canary pipeline that code changes go through. This preserves agility — PMs can run wording experiments without engineering — while preventing untested behavioral changes from reaching users. Hardcoded prompt strings are a code smell because they make both testing and rollback harder.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T05:20:45.667340+00:00— report_created — created