Grids, partitions, and shards

WebSphere eXtreme Scale Administration Guide > Capacity planning

Grids, partitions, and shards

An eXtreme Scale distributed grid is divided into partitions. A partition holds an exclusive subset of the data. A partition is made up of one or more shards:

primary shard
replica shards

You do not need to have replica shards in a partition, but replica shards provide high availability. Whether the deployment is an independent in-memory data grid or an in-memory database processing space, data access in eXtreme Scale relies heavily on sharding concepts.
The data for a partition is stored at run time in a set of shards. This set of shards includes a primary shared and possibly one or more replica shards. A shard is the smallest unit that eXtreme Scale can add or remove from a JVM.
Two placement strategies exist:

FIXED_PARTITIONS (default)
PER_CONTAINER

The following discussion focuses on the usage of the FIXED_PARTITIONS strategy.

Number of shards

If environment has ten partitions holding one million objects with no replicas, then ten shards would exist that each store 100,000 objects.
If you add a replica to this scenario, then an extra shard exists in each partition. In this case, 20 shards exist - ten primary shards and ten replica shards. Each one of these shards store 100,000 objects.
Each partition consists of a primary shard and one or more (N) replica shards. Determining the optimal shard count is critical. If you configure a small number of shards, data is not distributed evenly among the shards, resulting in out of memory errors and processor overloading issues.
You must have at least ten shards for each JVM as you scale.
When initially deploying the grid, you would potentially use a large number of partitions.

Number of shards per JVM

Scenario: Small number of shards for each JVM

Data is added and removed from a JVM using shard units. Shards are never split into pieces. If 10 GB of data existed, and 20 shards exist to hold this data, then each shard holds 500 MB of data on average.
If nine JVMs host the grid, then on average each JVM has two shards. Because 20 is not evenly divisible by 9, a few JVMs have three shards...

7 JVMs with 2 shards
2 JVMs with 3 shards

Because each shard holds 500 MB of data, the distribution of data is unequal. The seven JVMs with two shards each host 1 GB of data. The two JVMs with three shards have 50% more data, or 1.5 GB, which is a much larger memory burden. Because these two JVMs are hosting three shards, they also receive 50% more requests for their data.
As a result, a small number of shards for each JVM causes imbalance.
To increase the performance, you increase the number of shards for each JVM.

Scenario: Increased number of shards per JVM

In this scenario, consider a much larger number of shards. In this scenario, there are 101 shards with 9 JVMs hosting 10GB of data. In this case, each shard holds 99 MB of data. The JVMs have the following distribution of shards:

7 JVMs with 11 shards
2 JVMs with 12 shards

The two JVMs with 12 shards now have just 99 MB more data than the other shards, which is a 9% difference. This scenario is much more evenly distributed than the 50% difference in the scenario with a small number of shards.
From a processor use perspective, only 9% more work exists for the two JVMs with the 12 shards compared to the versus seven JVMs that have 11 shards. By increasing the number of shards in each JVM, the data and processor use is distributed in a fair and even way.
When you are creating the system, use 10 shards for each JVM in its maximally sized scenario, or when the system is running its maximum number of JVMs in the planning horizon.

Parent topic:
Capacity planning

+
Search Tips | Advanced Search