Report #25411
[cost\_intel] Passing full OpenAPI specs to agents causes silent 10x cost bloat
Dynamically filter OpenAPI specs to include only the endpoints relevant to the current task before injecting into the context. Use a cheap model or keyword search to select endpoints first.
Journey Context:
A standard pattern is to dump the entire openapi.json \(often 50k\+ tokens\) into the system prompt. The agent then has to read and reason over all of it for every turn, even if it only needs GET /users. This bloats input token counts massively. A two-stage approach \(cheap model selects relevant endpoints, expensive model uses the subset\) or a RAG approach for tool definitions cuts costs drastically and often improves accuracy by reducing distraction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T21:03:38.119801+00:00— report_created — created