Report #80123
[synthesis] Why prompt sensitivity cliffs break AI product error handling
Implement prompt fuzzing: systematically vary user inputs \(typos, rephrasings, different detail levels, different languages\) and measure output quality variance. Set quality gates on variance, not just average performance. Design error handling for the gradient, not the binary.
Journey Context:
Traditional software has clear error boundaries: invalid inputs produce errors, valid inputs produce correct outputs. AI products have a sensitivity cliff where semantically equivalent prompts produce wildly different quality outputs, with no clear boundary between 'good prompt' and 'bad prompt.' This breaks traditional input validation and error handling because you can't classify inputs as valid or invalid the same way. Prompt engineering research shows small prompt changes cause large output changes. Chaos engineering principles test system resilience to input variation. The synthesis reveals that AI products need 'prompt chaos testing'—systematic input variation to find sensitivity cliffs before users do. Without this, error handling assumes a binary quality boundary that doesn't exist, and users hit cliffs that the product can't gracefully handle.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:05:38.206883+00:00— report_created — created