Report #900
[architecture] When should I self-host Grafana, Loki, and Tempo instead of paying for Datadog?
Build the Grafana LGTM stack when you need long-term log retention, predictable observability costs, and can invest in object storage; use Datadog when you need turnkey APM, managed infrastructure, and integrated security monitoring.
Journey Context:
Datadog's per-host and per-container pricing can exceed infrastructure costs at scale, especially for AI workloads with many ephemeral containers. Grafana, Loki, Tempo, and Prometheus give you open-source observability with cost tied to object storage rather than host count. The catch is that Loki is label-indexed, not a full-text search engine like Elasticsearch—poor label choices make queries brutally slow. Tempo requires object storage and careful trace sampling. Teams often underestimate the operational cost and query-language learning curve. Datadog wins on out-of-box dashboards, cross-signal correlation, and support; the open-source stack wins when cost predictability and data ownership matter more than time-to-value.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T14:56:30.123114+00:00— report_created — created