Report #77629
[gotcha] MSK default min.insync.replicas=1 with producer acks=all provides no durability guarantee on single-AZ leader failure
Explicitly set min.insync.replicas=2 \(or cluster-size-1\) in the MSK configuration and ensure replication.factor=3; verify that the producer acks=all and retries>0, and monitor UnderReplicatedPartitions to ensure ISR size >= min.insync.replicas before acknowledging writes.
Journey Context:
Kafka durability relies on the in-sync replica set \(ISR\). When min.insync.replicas=1 \(MSK default for clusters <3 brokers or certain configurations\), a leader acknowledges a write as soon as it is written to its local log, regardless of follower lag. If the leader fails milliseconds later, the un-replicated message is lost permanently. Many operators assume acks=all means 'all replicas', but it actually means 'all in-sync replicas', and with min.insync.replicas=1, that is just the leader. The fix requires cluster-wide configuration changes and producer tuning; merely changing the topic config is insufficient if the brokers are configured with a global override.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:53:44.786069+00:00— report_created — created