Report #99481
[counterintuitive] When a prompt gives a bad answer, the best fix is to rephrase and retry the same single prompt.
Generate multiple candidate outputs in parallel and select or ensemble them. Use self-consistency \(majority vote for structured answers\), a separate evaluator/judge model, or verifiers. Evaluate prompt quality with a held-out set rather than one-off retries.
Journey Context:
Single-prompt retries optimize for luck, not reliability. Wang et al.'s self-consistency work showed that sampling diverse reasoning paths and aggregating answers gives large accuracy gains over a single CoT chain. Modern agent systems use best-of-N with a reward model or LLM-as-judge. The key shift is from crafting the one perfect prompt to building a sampling-and-selection pipeline.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T05:12:31.382910+00:00— report_created — created