WebSphere eXtreme Scale Product Overview > Cache concepts


Architecture and topology


With WebSphere eXtreme Scale, the architecture can use...

eXtreme Scale requires minimal additional infrastructure to operate. The infrastructure consists of scripts to install, start, and stop a JEE application on a server. Cached data is stored in the eXtreme Scale server, and clients remotely connect to the server.

Distributed caches offer increased performance, availability and scalability and can be configured using dynamic topologies, in which servers are automatically balanced. You can also add additional servers without restarting the existing eXtreme Scale servers. You can create either simple deployments or large, terabyte-sized deployments in which thousands of servers are needed.


Maps

A map is a container for key-value pairs, which allows an application to store a value indexed by a key. Maps support indexes that can be added to index attributes on the key or value. These indexes are automatically used by the query runtime to determine the most efficient way to run a query.

A map set is a collection of maps with a common partitioning algorithm. The data within the maps are replicated based on the policy defined on the map set. A map set is only used for distributed topologies and is not needed for local topologies.

A map set can have a schema associated with it. A schema is the metadata that describes the relationships between each map when using homogeneous Object types or entities.

eXtreme Scale can store serializable Java objects in each of the maps using the ObjectMap API. A schema can be defined over the maps to identify the relationship between the objects in the maps where each map holds objects of a single type. Defining a schema for maps is required to query the contents of the map objects. eXtreme Scale can have multiple map schemas defined.

eXtreme Scale can also store entities using the EntityManager API. Each entity is associated with a map.

The schema for an entity map set is automatically discovered using either an entity descriptor XML file or annotated Java classes. Each entity has a set of key attributes and set of non-key attributes. An entity can also have relationships to other entities. WebSphere eXtreme Scale supports one to one, one to many, many to one and many to many relationships.

Each entity is physically mapped to a single map in the map set. Entities allow applications to have complex object graphs that span multiple Maps. A distributed topology can have multiple entity schemas.


Containers, partitions, and shards

The container is a service that stores application data for the grid. This data is generally broken into partitions, and hosted across multiple containers. Each container in turn hosts a subset of the complete data. A JVM might host one or more containers and each container can host multiple shards.

Plan out the heap size for the containers, which host all of the data. Configure the heap settings accordingly.

Partitions host a subset of the data in the grid. eXtreme Scale automatically places multiple partitions in a single container and spreads the partitions out as more containers become available.

Choose the number of partitions carefully before final deployment because the number of partitions cannot be changed dynamically. A hash mechanism is used to locate partitions in the network and eXtreme Scale cannot rehash the entire data set after it has been deployed. As a general rule, you can overestimate the number of partitions

Shards are instances of partitions and have one of two roles: primary or replica. The primary shard and its replicas make up the physical manifestation of the partition. Every partition has several shards that each host all of the data contained in that partition. One shard is the primary, and the others are replicas, which are redundant copies of the data in the primary shard. A primary shard is the only partition instance that allows transactions to write to the cache. A replica shard is a "mirrored" instance of the partition. It receives updates synchronously or asynchronously from the primary shard. The replica shard only allows transactions to read from the cache. Replicas are never hosted in the same container as the primary and are not normally hosted on the same machine as the primary.

To increase the availability of the data, or increase persistence guarantees, replicate the data. However, replication adds cost to the transaction and trades performance in return for availability. With eXtreme Scale, you can control the cost as both synchronous and asynchronous replication is supported, as well as hybrid replication models using both synchronous and asynchronous replication modes. A synchronous replica shard receives updates as part of the transaction of the primary shard to guarantee data consistency. A synchronous replica can double the response time because the transaction has to commit on both the primary and the synchronous replica before the transaction is complete. An asynchronous replica shard receives updates after the transaction commits to limit impact on performance, but introduces the possibility of data loss as the asynchronous replica can be several transactions behind the primary.


Clients

Clients connect to a catalog service, retrieve a description of the server topology, and communicate directly to each server as needed. When the server topology changes because new servers are added or existing servers have failed, the dynamic catalog service routes the client to the appropriate server that is hosting the data. Clients must examine the keys of application data to determine which partition to route the request. Clients can read data from multiple partitions in a single transaction. However, clients can update only a single partition in a transaction. After the client updates some entries, the client transaction must use that partition for updates.

The possible deployment combinations are included in the following list:


Catalog service

The catalog service hosts logic that should be idle during a steady state and has little influence on scalability. The catalog service is built to service hundreds of containers becoming available simultaneously and runs services to manage the containers.

The catalog responsibilities consist of the following services:

Location service

The location service provides locality for clients that are looking for containers hosting applications and for containers that are looking to register hosted applications with the placement service. The location service runs in all of the grid members to scale out this function.

Placement service

The placement service is the central nervous system for the grid and is responsible for allocating individual shards to their host container. The placement service runs as a one-of-N elected service in the cluster so there is always exactly one instance of the placement service running. If that instance should stop, another process takes over. All states of the catalog service are replicated across all servers hosting the catalog service for redundancy.

Core group manager

The core group manager manages peer grouping for health monitoring, organizes containers into small groups of servers, and automatically federates the groups of servers. When a container first contacts the catalog service, the container waits to be assigned to either a new or an existing group of several Java virtual machines (JVM). Each group of Java virtual machines monitors the availability of each of its members through heartbeating. One of the group members relays availability information to the catalog service to allow for reacting to failures by reallocation and route forwarding.

Administration

The four stages of administering the eXtreme Scale environment are...

  • planning
  • deploying
  • managing
  • monitoring

For availability, configure a catalog service grid. A catalog service grid consists of multiple Java virtual machines, including a master JVM and a number of backup Java virtual machines.



Parent topic

Cache concepts


+

Search Tips   |   Advanced Search