Network Deployment (Distributed operating systems), v8.0 > Establishing high availability > High availability manager > Core groups (high availability domains)
Core group migration considerations
High availability manager and core group functionality is provided in v6 and higher. This topic discusses core group configuration and topology considerations that might impact your migration if you are migrating from a version of the product that does not contain this functionality, such as v5.1.
Before reading this article, you should understand the basic core group concepts contained in the following topics:
- "High availability manager"
- "When to use a high availability manager"
- "Core groups"
- "Core group transports"
- "Core group coordinator"
- "Core group administration considerations"
- "Core group scaling considerations"
Because special planning and migration activities might be required for the environment, before migrating from a version of the product that does not have a high availability manager to one that does have a high availability manager, you should know the answers to the following questions:
- What is the appropriate core group configuration and topology for a cell after it is fully migrated to v7.0?
- In a mixed environment, Is the v5.x cell configured to use memory to memory replication? If so, how is the replication environment migrated? How is the replication affected during the migration process?
- Which, if any, JMS provider is used in the v5.x cell?
- Which, if any, JMS provider is used in the v7.0 cell?
- If the v7.0 default IBM messaging provider is to be used in the v7.0 cell, should the messaging provider be configured to be made highly available?
- Should transaction log recovery be configured in the v7.0 cell?
Default core group related migration activities
Core group related activities that are automatically performed during the migration process are relatively simple and straightforward. When you migrate from a Version 5.x environment to a v7.0 environment, the following actions occur in the indicated order:
- The dmgr is migrated to v7.0.
- During the migration process, a v7.0 dmgr profile and a core group named DefaultCoreGroup is created.
- The new dmgr process is added to DefaultCoreGroup.
- Version 5.x nodes are migrated one by one to v7.0. As each of the nodes is migrated, the node agent and the application servers on the migrating node are added to the DefaultCoreGroup.
When the migration finishes, all of the processes in the Version 5.x cell are members of the v7.0 DefaultCoreGroup. Because the DefaultCoreGroup is configured with the default values for all settings, the migration process does not configure preferred coordinator servers and does not partition the coordinator across multiple servers. The migration process also does not create new high availability policies, and does not provide for any custom property configuration overrides.
Plan the Core Group Topology
For most Version 5.x topologies, a default migration yields an appropriate and workable Version 7.0 core group topology. In some cases, you might need to make minor changes to the topology, such as setting a non-default transport or configuring the core group for replication. If the v5.x cell is large enough to require multiple core groups in the v7.0 topology, then more planning should be done before you start the migration process to prevent application outages from occurring when you make your core group configuration changes..
Migrate a large v5.x cell to v7.0, where multiple core groups are required, can be a complex task. When the v7.0 topology requires multiple core groups, we have a number of options as to how, and when to partition the cell into multiple core groups. The approach you take should be based on such factors as the number of processes in the cell, and requirements for continuous application availability. For example, while the normal recommendation is to keep core groups at around 50 members, the practical limit is somewhat higher than 50. For topologies with a small number of applications installed on high end machines (large CPUs with a lot of memory), you might be able to have core groups of up to 200 members. If there are 150 processes in the cell and application availability is not an issue, one option might be to simply migrate the entire cell to v7.0, and then create additional core groups. If application availability is an issue, you should create the additional core groups during the migration process so that you do not have to stop and restart core group members after the migration process completes.
Core Group Size
The most important planning consideration is the size of your core group. By default, there is normally one core group per cell. Because core groups do not scale to large sizes, if your v5.x cell is large, you might want to create additional core groups for your v7.0 topology. You might also need to set up core group bridges if these multiple core groups need to communicate with each other.
Core Group Transport
If a change is made to the core group transport configuration, all core group members must be restarted before the change goes into affect. Therefore, planning is required to minimize the effect of changing the transport. If the transport for the DefaultCoreGroup is changed, the best time to change it is immediately after migrating the Deployment Manager, since at that point in time only the Deployment Manager will need to be restarted. If other core groups are created, then the transport should be configured properly as the new core groups are created.
Custom Property Configuration Overrides
A number of core group configuration parameters can be changed via Custom Property overrides. The available custom property overrides are documented in other Information Center articles in this section. Whenever a Custom Property override is added, removed or changed, all core group members must be restarted in order to pick up the change. Therefore, planning is required to minimize the effect of changing Custom Properties. If Custom Properties must be changed for the DefaultCoreGroup is changed, the best time to change it is immediately after migrating the Deployment Manager. If other core groups are created, then the Custom Properties should be changed as the new core groups are created.
Core Group Coordinator
Configure preferred coordinator servers is a best practice. Since the HA Manager can dynamically reread and apply core group coordinator configuration changes, a restart of all core group members to pick up this change is not required
Example: A Large Cell Migration
The following example illustrates some of the thought processes that you should go through as you plan for and execute the migration of a large v5.x cell to Version 7.0, where multiple core groups are required. For the purpose of this example, assume your v5.x cell has the following topology characteristics:
- The cell contains eight nodes, that are named Node1, Node2, Node3, ..., Node8, not including the dmgr node.
- The cell contains ten clusters, that are named Cluster1, Cluster2, Cluster3, ..., Cluster10.
- Clusters Cluster1 through Cluster9 each contains 32 application servers. The cluster members for these clusters are distributed symmetrically, four application servers per node, across all nodes
- Cluster10 contains 28 application servers. Cluster10 does not have any application servers on Node1. The application servers for Cluster10 are distributed symmetrically, four application servers per node, across nodes Node2 through Node8.
- There are a total of 316 application servers, 8 node agents and a dmgr in the cell.
- Each cluster has an application deployed to it that uses EJBs. and these applications can communicate with each other. Therefore, Work Load Management (WLM) routing information must be available everywhere in the cell.
- Applications must be continuously available during the migration.
- The migration is performed over a period of days or weeks.
The first things to consider in planning the v7.0 core group topology is that this cell contains 325 processes, and that continuous availability of applications is a requirement. These factors prevent us from simply migrating the entire cell and then reconfiguring the core groups. We must distribute the processes contained in the cell amongst multiple core groups as part of the migration process.
When determining how to distribute the V5.x cell processes amongst the new core group, make sure that each core group adheres to the following core group rules:
- Each core group must contain at least one administrative processes. Because the cell in this example has nine administrative processes, 8 node agents and the dmgr, the maximum number of core groups possible in this topology is nine.
- All members of a cluster must be members of the same core group.
- The number of processes contained in each core group should not exceed the recommended size of about 50 members.
Following these rules for this example:
- At least one of the core groups must contain two clusters because, you can only split the cell into a maximum of nine core groups, and there are ten clusters in the V5.x cell.
- Any of the core groups that contain multiple clusters, will have more than 50 members because each cluster contains either 28 or 32 application servers,
While the number of members in at least one core groups will exceed the recommended limit, the number of members is well within the practical limit, and should not create a problem.
Because the applications in this example require the WLM routing information for each cluster contained in the cell, core group bridges must be set up to enable communication between all of the core groups. (Refer to the core group bridge topics if you are not familiar with how to set up a core group bridge.) An appropriate core group bridge topology for this example includes:
- A core group access point for each core group. Each access point contains the set of processes that provide the bridge interfaces for the core group. The bridge interfaces are the processes that actually communicate with processes in other core groups.
- Two bridge interfaces for each access point to eliminate the core group bridge as a single point of failure. These two bridge interfaces will also be placed on different nodes to further ensure continual availability.
When you select processes to serve as the bridge interfaces, remember that bridge interfaces need extra memory and CPU cycles. Normally node agents are good processes to use as bridge interfaces because during normal operations a node agent has a lower workload than an application servers or the dmgr.
However, in this example, there are only eight node agents available to serve as bridge interfaces. Because the topology wants two bridge interfaces per access point, if you only use node agents as bridge interfaces, you are limited to four access points, and subsequently four core groups. Therefore, before starting the migration process, you might want to create eight stand-alone servers to specifically act as bridge interfaces, and that do not host applications. Then each access point can contain one node agent and one stand-alone bridge interface server. This setup gives you a total of eight access points and eight core groups.
- A single core group access point group that contains all of the access point. A single core group access point group ensures that all bridge interface processes can communicate directly. These bridge interfaces form a fully connected mesh.
An alternative topology is to use multiple access point groups, which results in a chain topology. In a chain topology communication is forwarded from one bridge interface to another through intermediate bridge interfaces along the chain.
Now that we have determined the setup for your core group bridge interfaces, you are ready to decide how to distribute the ten clusters, eight node agents, eight stand-alone bridge interface servers, and the dmgr across your eight core groups. You want to distribute the processes as evenly as possible across the eight core groups. The following topology is a good example of how to evenly distribute the process contained in the V5.x cell:
- The first core group, DefaultCoreGroup, contains the dmgr, the node agent from Node1, the bridge server from Node 2 and Cluster1.
- Core Group 2 contains the node agent from Node2, the bridge server from Node3 and Cluster2
- Core Group 3 contains the node agent from Node3, the bridge server from Node4 and Cluster3
The default transport in this example does not need to change.
Because this example does not indicate that you will need more than one coordinator per core group, you can leave the coordinator setting at the default value of 1. However, you might want to make the stand-alone bridge interface server, that is contained in each core group, the preferred coordinator server for that core group. This designation initially keeps the work load required of a coordinator away from the clustered application servers that are running applications.
Your migration plan
If, after reviewing the preceding example and completing the initial planning process for the cell you are migrating, you determine that the default migration flow is not appropriate for your target v7.0 topology, it is time to develop a plan or a road map for the actual migration process. This plan should include all necessary extra core group related steps for migrating from v5.x to v7.0. and answers to the following questions:
When will you create the new core groups?
The best time to create the new core groups is immediately after the dmgr migration completes. As the new core groups are created, you should configure the previously mentioned custom properties We can use either administrative console or the createCoreGroup wsadmin command to create your new core groups. However, use the admin console to configure the custom properties.
What actions do perform as nodes are migrated?
As each node is migrated, you should:
- Create the new stand-alone application server that is to be one of your core group bridge interfaces.
- Adjust the transport buffer size on all processes on the node. A script is the best option for performing this action.
- Adjust the heap size on the node agent and the stand-alone server, and turn on verbose GC for these processes.
All of these changes must be completed before you restart the migrated node. We can use the admin console to make these, and then perform a manual synchronization of the nodes configuration before restarting the node agent and application servers.
When and how are processes moved to new core groups?
By default, the migration process places all processes in the core group named DefaultCoreGroup. At some point in time the number of members contained in this core group will exceed the size limits and redistribute the processes to other core groups. It is important to understand that the processes must be stopped before they can be moved. If continuous application availability is required, carefully plan out the order in which you will move the processes to different core groups. We can use either the administrative console or the moveServerToCoreGroup wsadmin command to move the dmgr, node agents and stand-alone application server.
Move clustered application servers is more complicated. Under normal circumstances, We can use either the admin console or the moveServerToCoreGroup wsadmin command to move clusters. However, during the migration process, because the cluster to be moved might have both v7.0 and v5.x members, the normal commands fail because a v5.x cluster member is not yet a member of any core group.
To move a mixed cluster to a new core group, use the moveClusterToCoreGroup wsadmin command with the optional checkConfig parameter.
For example, suppose Cluster0 has cluster members A, B, C and D. Member A is on a node that has been migrated to v7.0 and is a member of the DefaultCoreGroup, while B, C and D are still on v5.x nodes. To move Cluster0 to core group CG1 use the following command”
$AdminTask moveClusterToCoreGroup {-source CoreGroup1 –target CG1 –clusterName Cluster0 –checkConfig false}When a clustered application server is migrated, the migration utilities determine if other cluster members have already been migrated and automatically place the migrating member in the same core group as other members of the same cluster that are already migrated.
In the example above, member A was moved to core group CG1. When the nodes containing B, C and D are migrated, migration will place these cluster members in CG1 instead of the DefaultCoreGroup. Therefore, it is necessary to run the moveClusterToCoreGroup command only once for each cluster.
When do configure your core group bridges?
By the time you move your processes to multiple core groups, have core group bridges configured and running. This means that the processes to use as bridge interfaces in your v7.0 target topology might not be available when they are initially needed because they have not been migrated from the v5.x nodes. Therefore, to ensure continual availability of the applications, configure some clustered application servers to be temporary bridge interfaces while the migration continues. After all of the processes have been migrated to v7.0, you can adjust the core group bridge configuration to match your desired v7.0 topology.
Other planning considerations
If your target Version 7.0 configuration requires multiple core group bridges, use the IBM_CS_WIRE_FORMAT_VERSION core group custom property to implement scaling improvements.
Also, if all of your core groups are bridged together and routing shared information amongst each other, the amount of data shared between the core group members is likely to be much larger than normal. Therefore, you should use the following settings to increase the core group memory settings to allow for a more efficient transfer of data:
- Set the IBM_CS_DATASTACK_MEG to 100
- Set the transport buffer size on all processes to 100.
You should also consider adjusting such factors such as JVM heap sizes for any node agent or application server that is being used as a bridge interface, and any stand-alone server that is being used as a coordinator. A recommended starting point is to increase the heap size by 512 megabytes. We can also turn on verbose GC monitoring for these processes so that you can fine tune these heap sizes over time.
Possible migration flows
There are a number of migration flows that you can implement for a successful migration. The following flows assume a common starting point where the dmgr migration has completed and the core groups have been created, but no further actions have been taken.
Migration Flow 1
In this flow, we strictly follow the rules. This flow is unsatisfactory for a number of reasons. As each node is migrated, clusters will need to be moved. This requires stopping all cluster members. This may lead to applications not being available. In addition, the bridges need to be reconfigured at each step.
- Migrate Node1. The DefaultCoreGroup contains the dmgr and all the processes from Node1. Since the DefaultCoreGroup contains less than 50 members, no further action is required.
- Migrate Node2. The DefaultCoreGroup now contains more than the recommended number of processes. Balance the processes over 2 core groups by moving half of the clusters and the node agent for Node2 into CoreGroup2. Since there are now multiple core groups being used, we need to configure the core group bridge. Create bridge interface servers on nodes Node1 and Node2. Configure the core group bridge to bridge the two core groups together.
- Migrate Node3. Balance the processes across 3 core groups by moving some of the clusters from the DefaultCoreGroup and CoreGroup2 to CoreGroup3. Move the node agent for Node3 to CoreGroup3. Create the bridge interface server on Node3. Reconfigure the core group bridge to bridge all three core groups together.
- Continue migrating the nodes until the migration is complete. As each node is migrated, some rebalancing and reconfiguring of the core group bridge may be necessary.
Migration Flow 2
In this flow, we temporarily bend the rules. This flow yields better results, as running application servers do not need to be stopped to move them to a different core group. While the migration is in progress, some core groups will not contain an administrative process for some period of time. This is a technical violation of the rules, but is acceptable as long as the core group configuration is not changed while the migration is in progress.
- Migrate Node1. Node1 contains members from all clusters except Cluster10
- Move all possible clusters to core group identified in the final target topology. The dmgr, node agent for Node1 and Cluster1 are already in the DefaultCoreGroup, so no further action is required for them. Move Cluster2 to CoreGroup2, Cluster3 to CoreGroup3 and so on. Create the bridge server for Node1 and place it in CoreGroup2.
- Configure the core group bridge to bridge all 8 core groups together. For simplicity we will temporarily configure a single bridge interface for each access point. (This will introduce a single point of failure while the migration is in progress) Since most of the bridge interfaces from the final topology are still on v5.x, we need to use application servers as temporary bridge interfaces in 6 of the 8 core groups. This may require a temporary increase in the heap size of selected application servers.
- Migrate Node2. Migration will automatically move the clustered application servers for Node2 to the proper core groups. Since Cluster10 did not have any application servers on Node1, manually move Cluster10 to CoreGroup8. Move the node agent for Node2 to CoreGroup2. Create the bridge server on Node2. Optionally, reconfigure the core group bridge so that some of the temporary bridge interface servers are on Node2 to help spread the load across both nodes.
- Continue migrating the nodes using the same pattern until all nodes have been migrated.
- When all nodes have been migrated, configure preferred coordinator servers. Reconfigure the bridge interfaces to match the final target topology (with two bridge interface servers in each access point) Stop and restart the servers that are serving as temporary bridge interfaces. Restart the new bridge interface servers.
Migration Flow 3
This flow is a variation on Flow 2. As noted, this flow is a variation on Flow 2. The benefit is that the initial bridge load is spread across three nodes instead of 1. The disadvantage is that the initial redistribution of clusters to core groups occurs after Node3 has migrated. This requires that the servers running on nodes Node1 and Node2 must be stopped in order for the move to occur. This may affect application availability.
- Migrate Node1. When this step is complete, the DefaultCoreGroup will contain 38 processes, which is within limits.
- Migrate Node2. When this step is complete, the DefaultCoreGroup will contain 79 processes. While this is larger than the recommended size, it is well within the practical limit.
- Migrate Node3. Move all clusters to core group identified in the final topology. Move Cluster2 to CoreGroup2, Cluster3 to CoreGroup3 and so on. Move the three node agents to the proper core groups. Create and move the three bridge interface servers to the proper core groups.
- Select clustered application servers to act as temporary bridges for the core groups that do not yet contain designated bridge interfaces. Temporarily adjust the heap sizes on these servers. Configure the core group bridge to bridge all 8 core groups together.
- Continue migrating the nodes until all nodes have been migrated.
- When all nodes have been migrated, configure preferred coordinators. Reconfigure the bridge interfaces to match the final target topology. Stop and restart processes as required.
High availability manager
When to use a high availability manager
Core group coordinator
Core group transports
Core group administration considerations
Core group scaling considerations
Core group View Synchrony Protocol
Core groups (high availability domains)
Create a new core group (high availability domain)
Change the number of core group coordinators
Configure core group preferred coordinators
Disable or enabling a high availability manager
Move core group members