Report #2073
[architecture] Grafana \(open-source LGTM\) vs Datadog: building observability vs buying it
Use the Grafana/Prometheus/Loki/Tempo open-source stack when you have platform-engineering capacity, want vendor-neutrality, and logs/metrics volume makes per-host SaaS pricing painful. Use Datadog when you need fastest time-to-insight, rich out-of-box integrations, and budget is less constrained than engineering time.
Journey Context:
Grafana is a visualization/alerting layer that connects to Prometheus \(metrics\), Loki \(logs\), Tempo \(traces\), plus 100\+ data sources; all are open-source and instrumented via OpenTelemetry. This cuts 50-70% of vendor cost at scale but shifts scaling, retention tuning, and upgrades to your team. Datadog bundles APM, infrastructure monitoring, logs, RUM, security, and anomaly detection with a per-host/data-ingest bill that climbs steeply. The mistake is assuming 'free OSS' means free overall: for small teams without DevOps, Datadog is often cheaper when you value engineer time. OpenTelemetry instrumentation is the hedge either way.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T09:54:31.941395+00:00— report_created — created