Cluster failure management
If a cluster member fails, we must take different administrative actions, depending on the role of the node in the cluster.
- Failure of the primary master
- Promote a different node to the primary master. For detailed steps that describe how to promote a different node, see Promoting a node to master.
You can promote a non-master node to the primary master so that other master nodes in the environment remain for failover purposes. If there is a secondary master in the environment, we can optionally promote it to primary master. The process for this promotion depends on whether there are tertiary and quaternary masters in the environment:
- If there are tertiary and quaternary masters, we must take either of the following actions at the same time as you promote the secondary master to primary:
- Promote a non-master node to secondary master, or
- Demote the tertiary and quaternary nodes to non-master nodes.
We cannot have a tertiary and quaternary master without a secondary master.
- If we do not have tertiary and quaternary masters, we can promote the secondary master to primary master and the cluster can operate with a single master. However, for high availability purposes, we might also want to promote a non-master node to secondary master.
- Remove the failed node from the cluster. For detailed steps, see Remove an unreachable master node from the cluster.
- Export the signature file from the new master. Use this signature file when we are adding new nodes to the cluster.
- Failure of a secondary, tertiary, or quaternary master
- Demote the failed node on the primary master.
- Promote a non-master node to replace the failed master. You might need to complete steps 1 and 2 simultaneously to ensure that you maintain a valid combination of master nodes. For information about valid architectures, see Cluster architecture rules.
- Remove the failed node from the cluster.
- Failure of a node
- Unregister the node on the primary master.
- Optionally, we can add a node to the cluster to replace the failed node.
Parent topic: Cluster support