Percona XtraDB Cluster: Failure Scenarios and their Recovery


Please watch Percona’s Senior Technical Manager, Alkin Tezuysal, and Krunal Bauskar as they present their talk, “Percona XtraDB Cluster: Failure Scenarios and their Recovery”.


Percona XtraDB Cluster (a.k.a PXC) is an open source, multi-master, high availability MySQL clustering solution. PXC works with your MySQL / Percona Server-created database. Given the multi-master aspect, there are multi-guards to protect a cluster from entering an inconsistent state. Most of these guards are configurable based on their user environment, but if they are not configured properly they could cause the cluster to stall, fail or error-out.

In this session, we will go over some of these failure scenarios, including a MySQL cluster entering a non-primary state due to network partitioning, a cluster stall due to flow control, data inconsistency causing the shutdown of a node and common problems during the initial catch up – a.k.a State Snapshot Transfer (SST). Other issues include delays in the purging of a transaction, a blocking DDL causing the entire cluster to stall and a misconfigured cluster.

We will also discuss how to solve some of these problems and how to safely recover from these failures.