WebSphere eXtreme Scale Product Overview > Availability > Replication for availability > Shard allocation: primary and replica > High-availability catalog service



Catalog server quorums


Quorum is the minimum number of catalog servers necessary to conduct placement operations for the grid. The minimum number is the full set of catalog servers unless quorum has been overridden.


Important terms

The following is a list of terms related to quorum considerations for eXtreme Scale.


Topology

This section explains how IBM WebSphere eXtreme Scale operates across a network that includes unreliable components. Examples of such a network would include a network spanning multiple data centers.


IP address space

WebSphere eXtreme Scale requires a network where any addressable element on the network can connect to any other addressable element on the network unimpeded. This means WebSphere eXtreme Scale requires a flat IP address naming space and it requires all firewalls to all traffic to flow between the IP addresses and ports used by the Java virtual machines (JVM) hosting elements of WebSphere eXtreme Scale.


Connected LANs

Each LAN is assigned a zone identifier for WebSphere eXtreme Scale requirements. WebSphere eXtreme Scale aggressively heartbeats JVMs in a single zone and a missed heartbeat will result in a failover event if the catalog service has quorum. Read about Configure failover detection for more information.


Catalog service grid and container servers

A grid is a collection of similar JVMs. A catalog service is a grid composed of catalog servers, and is fixed in size. However, the number of container servers is dynamic. Container servers can be added and removed on demand. In a three-data-center configuration, WebSphere eXtreme Scale requires one catalog service JVM per data center.

The catalog service domain uses a full quorum mechanism. This means that all members of the grid must agree on any action.

Container server JVMs are tagged with a zone identifier. The grid of container JVMs is automatically broken in to small core groups of JVMs. A core group will only include JVMs from the same zone. JVMs from different zones will never be in the same core group.

A core group will aggressively try to detect the failure of its member JVMs. The container JVMs of a core group must never span multiple LANs connected with links like in a wide area network. This means that a core group cannot have containers in the same zone running in different data centers.


Server life cycle


Catalog server startup

You can use the following methods to start catalog servers:


Quorum configuration methods for catalog servers The quorum mechanism is disabled by default. Use one of the following methods to enable quorum. All the catalog servers must be given the same quorum setting.


Container server startup

The container servers are started using the startOgServer command. When running a grid across data centers the servers must use the zone tag to identify the data center in which they reside. Setting the zone on the grid servers allows WebSphere eXtreme Scale to monitor health of the servers scoped to the data center, minimizing cross-data-center traffic.

# bin/startOgServer gridA0 –serverProps objectGridServer.properties – objectgridfile xml/objectgrid.xml –deploymentpolicyfile xml/deploymentpolicy.xml

objectGridServer.properties file
catalogServiceEndPoints= cat0.domain.com:2809, cat1.domain.com:2809
zoneName=ZoneA


Grid server shutdown

The grid servers are stopped using the stopOgServer command. When shutting down an entire data center for maintenance, pass the list of all the servers that belong to that zone. This will allow a clean transition of state from the zone in teardown to the surviving zone or zones.

# bin/stopOgServer gridA0,gridA1,gridA2 –catalogServiceEndPoints cat0.domain.com:2809,cat1.domain.com:2809


Failure detection

WebSphere eXtreme Scale detects process death through abnormal socket closure events. The catalog service will be told immediately when a process terminates. A black out is detected through missed heartbeats. WebSphere eXtreme Scale protects itself against brown out conditions across data centers by using a quorum implementation.


Heartbeating implementation

This section describes how liveness checking is implemented in WebSphere eXtreme Scale.


Core group member heartbeating

The catalog service places container JVMs into core groups of a limited size. A core group will try to detect the failure of its members using two methods. If a JVM socket is closed, that JVM is regarded as dead. Each member also heart beats over these sockets at a rate determined by configuration. If a JVM does not respond to these heartbeats within a configured maximum period of time then the JVM is regarded as dead.

A single member of a core group is always elected to be the leader. The core group leader (CGL) is responsible to periodically tell the catalog service that the core group is alive and to report any membership changes to the catalog service. A membership change can be a JVM failing or a newly added JVM joining the core group.

If the core group leader cannot contact any member of the catalog service domain then it will continue to retry.


Catalog service domain heartbeating

The catalog service looks like a private core group with a static membership and a quorum mechanism. It detects failures the same way as a normal core group. However, the behavior is modified to include quorum logic. The catalog service also uses a less aggressive heartbeating configuration.


Core group heartbeating

The catalog service needs to know when container servers fail. Each core group is responsible for determining container JVM failure and reporting this to the catalog service through the core group leader. The complete failure of all members of a core group is also a possibility. If the entire core group has failed, it is the responsibility of the catalog service to detect this loss.

If the catalog service marks a container JVM as failed and the container is later reported as alive, the container JVM will be told to shutdown the WebSphere eXtreme Scale container servers. A JVM in this state will not be visible in xsadmin queries. There will be messages in the logs of the container JVM indicating this has happened. These JVMs need to be manually restarted.

If a quorum loss event has occurred, heartbeating is suspended until quorum is reestablished.

For more information about configuring heartbeating, see Configure failover detection.


Catalog service quorum behavior

Normally, the members of the catalog service have full connectivity. The catalog service grid is a static set of JVMs. WebSphere eXtreme Scale expects all members of the catalog service to be online always. The catalog service will only respond to container events while the catalog service has quorum.

If the catalog service loses quorum, it will wait for quorum to be reestablished. While the catalog service does not have quorum, it will ignore events from container servers. Container servers will retry any requests rejected by the catalog server during this time as WebSphere eXtreme Scale expects quorum to be reestablished.

The following message indicates that quorum has been lost. Look for this message in the catalog service logs.

CWOBJ1254W: The catalog service is waiting for quorum.

WebSphere eXtreme Scale expects to lose quorum for the following reasons:

Stop a catalog server instance using stopOgServer does not cause loss of quorum because the system knows the server instance has stopped, which is different from a JVM failure or brown out.


Quorum loss from JVM failure

A catalog server that fails will cause quorum to be lost. In this case, quorum should be overridden as fast as possible. The failed catalog service cannot rejoin the grid until quorum has been overridden.


Quorum loss from network brown out

WebSphere eXtreme Scale is designed to expect the possibility of brown outs. A brown out is when a temporary loss of connectivity occurs between data centers. This is usually transient in nature and brown outs should clear within a matter of seconds or minutes. While WebSphere eXtreme Scale tries to maintain normal operation during the brown out period, a brown out is regarded as a single failure event. The failure is expected to be fixed and then normal operation resumes with no WebSphere eXtreme Scale actions necessary.

A long duration brown out can be classified as a blackout only through user intervention. Overriding quorum on one side of the brown out is required in order for the event to be classified as a black out.


Catalog service JVM cycling

If a catalog server is stopped by using stopOgServer, then the quorum drops to one less server. This means the remaining servers still have quorum. Restarting the catalog server bumps quorum back to the previous number.


Consequences of lost quorum

If a container JVM was to fail while quorum is lost, recovery will not take place until the brown out recovers or in the case of a black out the customer does an override quorum command. WebSphere eXtreme Scale regards a quorum loss event and a container failure as a double failure, which is a rare event. This means that applications may lose write access to data that was stored on the failed JVM until quorum is restored at which time normal recovery will take place.

Similarly, if you attempt to start a container during a quorum loss event, the container will not start.

Full client connectivity is allowed during quorum loss. If no container failures or connectivity issues happen during the quorum loss event then clients can still fully interact with the container servers.

If a brown out occurs then some clients may not have access to primary or replica copies of the data until the brown out clears.

New clients can be started, as there should be a catalog service JVM in each data center so at least one catalog service JVM can be reached by a client even during a brown out event.


Quorum recovery

If quorum is lost for any reason, when quorum is reestablished, a recovery protocol is executed. When the quorum loss event occurs, all liveness checking for core groups is suspended and failure reports are also ignored. Once quorum is back then the catalog service does a liveness check of all core groups to immediately determine their membership. Any shards previously hosted on container JVMs reported as failed will be recovered at this point. If primary shards were lost then surviving replicas will be promoted to primaries. If replica shards were lost then additional replicas will be created on the survivors.


Ovveride quorum

This should only be used when a data center failure has occurred. Quorum loss due to a catalog service JVM failure or a network brownout should recovery automatically once the catalog service JVM is restarted or the network brownout clears.

Administrators are the only ones with knowledge of a datacenter failure. WebSphere eXtreme Scale treats a brown out and a black out similarly. You must inform the eXtreme Scale environment of such failures using the xsadmin command to override quorum. This will tell the catalog service to assume that quorum is achieved with the current membership and full recovery will take place. When issuing an override quorum command, you are guaranteeing that the JVMs in the failed data center have truly failed and will not recover.

The following list considers some scenarios for overriding quorum. Say you have three catalog servers: A, B, and C.


Container behavior

This section describes how the container server JVMs behave while quorum is lost and recovered.

Containers host one or more shards. Shards are either primaries or replicas for a specific partition. The catalog service assigns shards to a container and the container will honor that assignment until new instructions arrive from the catalog service. This means that if a primary shard in a container cannot communicate with a replica shard because of a brown out then it will continue to retry until it receives new instructions from the catalog service.

If a network brown out occurs and a primary shard loses communication with the replica then it will retry the connection until the catalog service provides new instructions.


Synchronous replica behavior

While the connection is broken the primary can accept new transactions as long as there are at least as many replicas online as the minsync property for the map set. If any new transactions are processed on the primary while the link to the synchronous replica is broken, the replica will be cleared and resynchronized with the current state of the primary when the link is reestablished.

Synchronous replication is strongly discouraged between data centers or over a WAN-style link.


Asynchronous replica behavior

While the connection is broken the primary can accept new transactions. The primary will buffer the changes up to a limit. If the connection with the replica is reestablished before that limit is reached then the replica is updated with the buffered changes. If the limit is reached, then the primary destroys the buffered list and when the replica reattaches then it is cleared and resynchronized.


Client behavior

Clients are always able to connect to the catalog server to bootstrap to the grid whether the catalog service domain has quorum or not. The client will try to connect to any catalog server instance to obtain a route table and then interact with the grid. Network connectivity may prevent the client from interacting with some partitions due to network setup. The client may connect to local replicas for remote data if it has been configured to do so. Clients will not be able to update data if the primary partition for that data is not available.


Quorum commands with xsadmin

This section describes xsadmin commands useful for quorum situations. See Use the xsadmin sample utilitythe information about xsadmin in the Administration Guide for more information about xsadmin.


Query quorum status

The quorum status of a catalog server instance can be interrogated using the xsadmin command.

xsadmin –ch cathost –p 1099 –quorumstatus

There are five possible outcomes.


Ovveride quorum

The xsadmin command can be used to override quorum. Any surviving catalog server instance can be used. All survivors are notified when one is told to override quorum. The syntax for this is as follows.

xsadmin –ch cathost –p 1099 –overridequorum


Diagnostic commands


Transport security considerations

Since data centers are normally deployed in different geographical locations, users might want to enable transport security between the data centers for security reasons.

Read about transport layer security for more details.



Parent topic

High-availability catalog service