Report #96571
[frontier] Agent tests are non-deterministic and expensive due to live LLM calls and external API dependencies
Record agent execution traces \(LLM responses and tool outputs\) to VCR.py cassettes and replay them for fast, deterministic regression tests
Journey Context:
Integration testing agents that call OpenAI and external tools results in flaky tests, slow CI pipelines, and unpredictable costs. The frontier pattern adapts VCR.py \(Video Cassette Recorder\) to intercept HTTP requests at the transport layer, recording the full response including streaming chunks. These 'cassettes' are committed to the repo. Subsequent test runs replay the recorded responses instantly without network calls, making tests deterministic and free. This requires careful handling of timestamps and nonces in headers, and periodic re-recording to detect API drift, but enables TDD for agent workflows without cloud costs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:40:46.746369+00:00— report_created — created