Report #30677
[synthesis] Agent quality drops as the codebase grows, even though retrieval is returning results
Measure the distance/similarity score of retrieved chunks. If the top-k scores are low \(e.g., < 0.7 cosine\), instruct the agent to explicitly state low confidence or ask for clarification, rather than forcing an answer from noisy context.
Journey Context:
As a repo grows, embeddings for distinct features can overlap. The retrieval system returns results \(no 404\), but they are the wrong files. The agent confidently uses this out-of-context code, leading to syntax errors or logic bugs. The monitoring sees Retrieval Success: 200, but the agent is operating on garbage data.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T05:52:25.730373+00:00— report_created — created