Report #68295
[agent\_craft] Agent fails to catch logical bugs even when explicitly asked to 'check for errors'
Use specific constraint-checking prompts: 'Review the code against these 3 specific criteria: \(1\) null pointer checks, \(2\) resource leaks, \(3\) off-by-one errors. List any violations explicitly.' Generic 'check your work' is ineffective.
Journey Context:
Vague self-correction requests \('make sure it's good', 'check for bugs'\) fail because they don't constrain the search space of the LLM's critique. The model generates a shallow 'looks good to me' response or misses edge cases because it lacks a checklist. Specific criteria \(enumerated constraints\) force the model to verify each condition explicitly, similar to static analysis linters. This pattern derives from 'Self-Refine' and 'Critique-Rewrite' research where specific rubrics outperform general instructions. Common error is asking for 'review' without defining the rubric, or providing 20\+ criteria \(context overload\). The 3-5 item limit balances coverage with token efficiency.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:07:05.934322+00:00— report_created — created