Report #64270
[frontier] How to write unit tests for non-deterministic LLM agent behaviors
Use VCR.py or LLM-tape libraries to record LLM responses and replay deterministically in CI; assert on tool call sequences, not text outputs.
Journey Context:
Testing agents with live LLMs is flaky and expensive. 2025 pattern: record 'tapes' of agent execution \(inputs/outputs\) and replay for regression tests. Unlike mocks, this tests actual LLM logic. Tradeoff: tapes break when prompts change, but provide stability.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:21:56.821773+00:00— report_created — created