Assign local Publish Servers to local data collectors to reduce WAN traffic in a split Managing Server installation

IBM Tivoli Composite Application Manager for Application Diagnostics, Version 7.1.0.1

Assign local Publish Servers to local data collectors to reduce WAN traffic in a split Managing Server installation - UNIX

By default, the Managing Server kernel assigns Publish Servers to data collectors based on Publish Server availability, using a round robin algorithm. On a multiple site deployment, this can lead to a data collector on one site being assigned to a Publishing Server on a different site, even if a Publishing Server on the same site is also available. This generates significant network traffic between sites and may adversely affect performance. To ensure a local Publish Server is assigned to data collectors when available, you can set up a subnet policy algorithm instead of round robin.
In the subnet policy algorithm, each data collector will be assigned a Publish Server that resides under the same subnet as the data collector, i.e. each data collector on Site A will be assigned a Publish Server also located on Site A (if such a Publishing Server is available).
This type of subnet policy applies only for data collector and Publish Server assignment; it does not apply to Archive Agent, which is still assigned to Publish Server based on the Round Robin algorithm.
If you have more than one kernel, all kernels must have the same policy settings. Currently we do not support “roundrobin” policy for one kernel and “subnet” policy for another kernel. Moreover, all the other properties related to this policy also must be the same in all kernel properties file. You can only adjust the policy on all kernels at the same time.
To enable the subnet policy, based on the IP address and hostname, perform the following steps.
Modify the following parameters in the kl1.properties and kl2.properties files, located in MS_home/etc.
policy=subnet
In the policy parameter, roundrobin means the default algorithm; subnet enables the subnet algorithm. If this property is not specified, the kernel uses the default “roundrobin” algorithm.
subnet.policy.autodetect=true or subnet.policy.autodetect=false
If subnet.policy.autodetect is set to true, the kernel will find the best matching Publish Server for a data collector based on their IP addresses. The best match is determined by comparing IP addresses of the data collector and Publish Server. At least the first two numbers of the IP Address must be the same. For example, if two Publish Servers have the following addresses:

ps1=9.52.42.33
ps2=9.51.41.31
For data collector on dc1=9.51.22.22, ps2 will be assigned.
For data collector on dc2=9.52.42.33, ps1 will be assigned.
For data collector on dc3=9.33.33.33, there is no match. The kernel will fall back to a round robin algorithm, and assign the first available PS which could be either ps1 or ps2.
For subnet.policy.autodetect=true, you do not need to define any subnet groups. If subnet.policy.autodetect is not specified, true is assumed.
If subnet.policy.autodetect is set to false, you need to define subnet policy groups. These groups include IP addresses and hostnames of data collector and Publish Server hosts. For example:
subnet.policy.group1=191.181.*.*,192.82.*.*,devapp-lnx-*,9.52.22.21
subnet.policy.group2=192.182.*.*,9.9.9.9.9.*,*-lnx-*,server.company.com
...
subnet.policy.groupN=193.183.*.*,92.82.*.*
N can be 1, 2, 3….100 . You can use any numbers from 1 to 100, no particular sequence is required. Every group can have as many IP addresses or hostnames as needed, as full values or masks, separated by commas.
Each group must include IP addresses or hostnames of at least one Data Collector and at least one Publish Server; otherwise, data collectors in this group will be assigned to other Publish Servers.
in an IP address mask, every component of the address must be either a number or an asterisk. For example, 192.*.*.2 is a valid mask, but 192.2*.1*3.2 is not a valid mask.
The data collector will be assigned a Publish Server that falls into the same policy group as the data collector. If there are multiple Publish Servers within the same subnet group, the round robin algorithm is used among them. You can specify both 4 byte and 6 byte IP address groups.
If you do not specify the groups, or a data collector does not fall into any of the specified groups, or there is no Publish Server available under the same policy group as a data collector, the kernel will fall back to autodetection, and assign the closest available Publish Server according to the IP address of the data collector.
It is not necessary that different IP patterns under same policy group reside under the same subnet mask. Note that a data collector will be assigned to a Publish Server under the same policy group, but they might be under different subnet masks. It is your responsibility to provide Publish Servers and define the right policy groups to provide for load balancing and avoid inter-site network traffic.
For example, the following scenario may have a major performance impact:
2000 data collectors and 6 Publish Servers on Site A
6000 data collector and 2 Publish Servers on Site B
In this case, the Publish Servers on Site B can get a significantly larger load. It is your responsibility to either change assignment policies or add more Publish Servers to Site B.
Uncomment the following property in the kl1.properties and kl2.properties files, located in MS_home/etc:
kernel.accept.request.duration=5
You can set a higher value if you wish. The default value is 3.
It is expected that all Managing Servers (kernels) will be started within the set number of minutes (by default, 3 minutes; with the uncommented line as quoted, 5 minutes). The kernel will wait for this duration before it starts assigning data collectors to Publish Servers. If one of the Managing Servers is not started within this time, other kernels will start assigning data collectors to Publish Servers without taking the missing Publish Servers into account; this may lead to data collectors being assigned to wrong Publish Servers.
For example, you may have two Publish Servers on two different hosts:
host1=9.52.21.21
host2=192.168.21.21
Data Collectors are also running on both hosts.
Suppose you start Managing Server 1 but wait longer than kernel.accept.request.duration minutes to start the Managing Server 2. In this case, as the publish server on host2 is not yet available, the kernel will assign the data collector on host2 to the Publish Server on host1.
Therefore, you need to start all Managing Servers within the duration set in kernel.accept.request.duration.
Stop all Managing Servers by using the MS_home/bin/am-stop.sh script. Check if any process has not stopped (and therefore is hanging); in this case, run the MS_home/bin/am-forcestop.sh script.
Start all Managing Servers by using the MS_home/bin/am-start.sh script.
Tip: You can check the current assignment of data collectors to Publish Servers using the Managing Server Self Diagnosis window. In the Visualization Engine main menu, select Administration > Managing Server > Self diagnosis; then select every Publish Server to view a list of data collector relationships. If any data collector is working with the Managing Server but not displayed in this window for any of the Publish Servers, select Administration > Server Management > Data Collector Configuration > Configured data collectors from the main menu, then unconfigure the affected data collector and configure it again. After this, it will be listed in the Self diagnosis window for one of the Publish Servers.
Sometimes, a server would belong to more than one user defined policy group according to its hostname and IP address. In this case, the following priority rules apply:

If the hostname would belong to more than one policy group, the group with the lower number takes priority. For example, if the hostname fulfils the criteria for both subnet.policy.group2 and subnet.policy.group4, subnet.policy.group2 takes priority.
If the IP address would belong to more than one policy group, the more specific definition takes priority. For example, if the address is 192.168.3.1, subnet.policy.group1 includes 192.168.*.*, and subnet.policy.group2 includes 192.168.3.*, subnet.policy.group2 takes priority.
If the IP address or hostname would belong to more than one policy group and the definitions are of the same level of specificity, group with the lower number takes priority.

Parent topic:
Customization for the Managing Server on UNIX and Linux

+
Search Tips | Advanced Search