Report #41995
[research] Agent integration tests consume massive token budgets and slow down CI
Implement a tiered eval pipeline: run cheap unit evals \(prompt validation, tool schema checks\) on every commit; run expensive end-to-end agent evals only on PRs or merges to main.
Journey Context:
End-to-end agent runs are expensive and slow. Running them on every commit makes CI unbearably slow and burns API budget. You need eval-before-scaling: isolate the LLM call from the tool execution. Mock the tool outputs to test just the LLM's routing logic \(cheap/fast\). Only run the full un-mocked agent loop in higher-tier CI stages.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T00:57:38.051757+00:00— report_created — created