Product overview > Availability overview > High availability



Catalog server quorums

Quorum is the minimum number of catalog servers that are necessary to conduct placement operations for the data grid. The minimum number is the full set of catalog servers unless you override the quorum value.


Important terms

The following is a list of terms related to quorum considerations for WebSphere eXtreme Scale.

Topology considerations


Topology considerations

This section explains how WebSphere eXtreme Scale operates across a network that includes unreliable components. Examples of such a network would include a network spanning multiple data centers.


IP address space

WebSphere eXtreme Scale requires a network where any addressable element on the network can connect to any other addressable element on the network unimpeded. This network requirement means that WebSphere eXtreme Scale requires a flat IP address naming space. All the firewalls in the configuration must allow all traffic to flow between the IP addresses and ports that are being used by the JVMs (JVM) that are running WebSphere eXtreme Scale.


Connected LANs

Each LAN is assigned a zone identifier for WebSphere eXtreme Scale requirements. WebSphere eXtreme Scale aggressively heartbeats the JVMs in a single zone. A missed heartbeat results in a failover event if the catalog service has quorum.


Catalog service domain and container servers

A catalog service domain is a collection of similar JVMs. A catalog service domain is a data grid that is composed of catalog servers, and is fixed in size. However, the number of container servers is dynamic. Container servers can be added and removed on demand. If three data centers exist in the configuration, WebSphere eXtreme Scale requires one catalog service JVM for each data center.

The catalog service domain uses a full quorum mechanism. Because of this full quorum mechanism, all members of the data grid must agree on any action.

Container server JVMs are tagged with a zone identifier. The data grid of container JVMs is automatically broken in to small core groups of JVMs. A core group only includes JVMs from the same zone. JVMs from different zones are never in the same core group.

A core group aggressively tries to detect the failure of its member JVMs. The container server JVMs in a core group must never span multiple LANs that are connected with links, like in a wide area network. A core group cannot have container servers in the same zone running in different data centers.


Transport security

Because data centers are normally deployed in different geographical locations, you might want to enable transport security between the data centers for security reasons.

Read about transport layer security for more details.


Failure detection and heartbeating


Failure detection

WebSphere eXtreme Scale detects when processes terminate through abnormal socket closure events. The catalog service is notified immediately when a process terminates. A black out is detected through missed heartbeats. WebSphere eXtreme Scale protects itself against brown out conditions across data centers by using a quorum implementation.


Core group member heartbeating

The catalog service places container JVMs into core groups of a limited size. A core group will try to detect the failure of its members using two methods. If a JVMsocket is closed, that JVM is regarded as dead. Each member also heart beats over these sockets at a rate determined by configuration. If a JVM does not respond to these heartbeats within a configured maximum period of time then the JVM is regarded as dead.

A single member of a core group is always elected to be the leader. The core group leader (CGL) is responsible to periodically tell the catalog service that the core group is alive and to report any membership changes to the catalog service. A membership change can be a JVM failing or a newly added JVM joining the core group.

If the core group leader cannot contact any member of the catalog service domain then it will continue to retry.


Catalog service domain heartbeating

The catalog service domain looks like a private core group with a static membership and a quorum mechanism. It detects failures the same way as a normal core group. However, the behavior is modified to include quorum logic. The catalog service also uses a less aggressive heartbeating configuration.


Core group heartbeating

The catalog service needs to know when container servers fail. Each core group is responsible for determining container JVM failure and reporting this to the catalog service through the core group leader. The complete failure of all members of a core group is also a possibility. If the entire core group has failed, it is the responsibility of the catalog service to detect this loss.

If the catalog service marks a container JVM as failed and the container is later reported as alive, the container JVM will be told to shutdown the WebSphere eXtreme Scale container servers. A JVM in this state is not visible in xsadmin command queries. Messages in the logs of the container JVM indicate that the container JVM has failed. You must manually restart these JVMs.

If a quorum loss event has occurred, heartbeating is suspended until quorum is reestablished.

For more information about configuring heartbeating, see Configure failover detection.


Catalog service quorum behavior

Normally, the members of the catalog service have full connectivity. The catalog service domain is a static set of JVMs. WebSphere eXtreme Scale expects all members of the catalog service to be online. The catalog service only responds to container events while the catalog service has quorum.

If the catalog service loses quorum, it waits for quorum to be reestablished. During the time period in which the catalog service does not have quorum, it ignores events from container servers. Container servers retry any requests rejected by the catalog server during this time as WebSphere eXtreme Scale expects quorum to be reestablished.

WebSphere eXtreme Scale expects to lose quorum for the following reasons:

Stop a catalog server instance using stopOgServer does not cause loss of quorum because the system knows the server instance has stopped, which is different from a JVM failure or brown out.


Quorum loss from JVM failure

A catalog server that fails causes quorum to be lost. If a JVM fails, quorum should be overridden as fast as possible. The failed catalog service cannot rejoin the data grid until quorum has been overridden.


Quorum loss from network brown out

WebSphere eXtreme Scale is designed to expect the possibility of brown outs. A brown out is when a temporary loss of connectivity occurs between data centers. This is usually transient in nature and brown outs should clear within a matter of seconds or minutes. While WebSphere eXtreme Scale tries to maintain normal operation during the brown out period, a brown out is regarded as a single failure event. The failure is expected to be fixed and then normal operation resumes with no WebSphere eXtreme Scale actions necessary.

A long duration brown out can be classified as a blackout only through user intervention. Overriding quorum on one side of the brown out is required in order for the event to be classified as a black out.


Catalog service JVM cycling

If a catalog server is stopped by using the stopOgServer command, then the quorum drops to one less server. This means the remaining servers still have quorum. Restarting the catalog server bumps quorum back to the previous number.


Consequences of lost quorum

If a container JVM was to fail while quorum is lost, recovery will not take place until the brown out recovers or in the case of a black out the customer does an override quorum command. WebSphere eXtreme Scale regards a quorum loss event and a container failure as a double failure, which is a rare event. This means that applications may lose write access to data that was stored on the failed JVM until quorum is restored at which time normal recovery will take place.

Similarly, if you attempt to start a container during a quorum loss event, the container will not start.

Full client connectivity is allowed during quorum loss. If no container failures or connectivity issues happen during the quorum loss event then clients can still fully interact with the container servers.

If a brown out occurs then some clients may not have access to primary or replica copies of the data until the brown out clears.

New clients can be started, as there should be a catalog service JVM in each data center so at least one catalog service JVM can be reached by a client even during a brown out event.


Quorum recovery

If quorum is lost for any reason, when quorum is reestablished, a recovery protocol is executed. When the quorum loss event occurs, all liveness checking for core groups is suspended and failure reports are also ignored. Once quorum is back then the catalog service does a liveness check of all core groups to immediately determine their membership. Any shards previously hosted on container JVMs reported as failed will be recovered at this point. If primary shards were lost then surviving replicas will be promoted to primaries. If replica shards were lost then additional replicas will be created on the survivors.


Ovveride quorum

This should only be used when a data center failure has occurred. Quorum loss due to a catalog service JVM failure or a network brownout should recover automatically once the catalog service JVM is restarted or the network brownout clears.

Administrators are the only ones with knowledge of a data center failure. WebSphere eXtreme Scale treats a brown out and a black out similarly. You must inform the WebSphere eXtreme Scale environment of such failures using the xsadmin command to override quorum. This will tell the catalog service to assume that quorum is achieved with the current membership and full recovery will take place. When issuing an override quorum command, you are guaranteeing that the JVMs in the failed data center have truly failed and will not recover.

The following list considers some scenarios for overriding quorum. Say you have three catalog servers: A, B, and C.


Container behavior during quorum loss

Containers host one or more shards. Shards are either primaries or replicas for a specific partition. The catalog service assigns shards to a container and the container will honor that assignment until new instructions arrive from the catalog service. This means that if a primary shard in a container cannot communicate with a replica shard because of a brown out then it will continue to retry until it receives new instructions from the catalog service.

If a network brown out occurs and a primary shard loses communication with the replica then it will retry the connection until the catalog service provides new instructions.


Synchronous replica behavior

While the connection is broken the primary can accept new transactions as long as there are at least as many replicas online as the minsync property for the map set. If any new transactions are processed on the primary while the link to the synchronous replica is broken, the replica will be cleared and resynchronized with the current state of the primary when the link is reestablished.

Synchronous replication is strongly discouraged between data centers or over a WAN-style link.


Asynchronous replica behavior

While the connection is broken the primary can accept new transactions. The primary will buffer the changes up to a limit. If the connection with the replica is reestablished before that limit is reached then the replica is updated with the buffered changes. If the limit is reached, then the primary destroys the buffered list and when the replica reattaches then it is cleared and resynchronized.


Client behavior during quorum loss

Clients are always able to connect to the catalog server to bootstrap to the data grid whether the catalog service domain has quorum or not. The client will try to connect to any catalog server instance to obtain a route table and then interact with the data grid. Network connectivity may prevent the client from interacting with some partitions due to network setup. The client may connect to local replicas for remote data if it has been configured to do so. Clients will not be able to update data if the primary partition for that data is not available.


Parent topic:

High availability


Related concepts

Replication for availability

High-availability catalog service


Related tasks

Monitor with the xsAdmin sample utility


+

Search Tips   |   Advanced Search