Understand shards
Learn about shards in the MPF Operational Analytics.
It is important to carefully consider setting the number of shards when you set up a cluster. The number of shards can be set only once, by the first node in the cluster. If the number of shards must be changed later, completely reindex all of the data that is stores in the MPF Operational Analytics.
Ideally, the number of shards is equal to the maximum number of nodes that the cluster eventually expands to. Because the maximum number of nodes needed is often unknown at installation time, it is a common practice to create more shards than needed.
The following images show how sharding works.
Here is a cluster with one node and six shards. Because there is only one node, all six shards live on the same node. The single node handles all requests and data processing.
After several months of use, requests made to the cluster begin to perform poorly. It is determined that a single node is no longer adequate to handle processing for all of the incoming data. A new node is added to the cluster.
After a new node is added, the shards are automatically evenly distributed across all of the nodes.
Now when a request comes in to either node, the request is forwarded to the node that has the shard containing the data. The data indexing and processing is now split between the two nodes. Because the requests and data processing is now split between the nodes, the performance and response times improve.
After several months, requests begin to slow down again, and it is determined that a third node is required.
The shards are split between the three nodes.
This process repeats itself until there are six nodes, which each contains one shard. It is now no longer possible to add more nodes because each shard contains only one node.
If it is determined that six nodes are no longer sufficient to handle the incoming data load, a new cluster must be set up. The data must then be reindexed with a larger shard limit.
It is important to understand that the distribution of the shards happens automatically. The only configuration that must be made for shards is specifying the number of shards at installation time.
A small performance hit comes with having more than one shard per node. Although this performance hit is often negligible, the cluster should not be configured with an arbitrarily large number of shards.
Parent topic: Clustering terminology