$('a[name]').remove(); $('#ic-homepage__footer').before('

'); $("#tabs").tabs({ selected: 1 }); $("#ic-homepage__ic-tips").append( quickTipHTML() ); unhideOneProductTip(); $("#ic-homepage__product-tips").wrapInner('

'); $("#ic-homepage__feed-tips").wrapInner('

'); });

IBM Tivoli Monitoring > Version 6.3 > User's Guides > Agent Builder User's Guide > Troubleshooting IBM Tivoli Monitoring, Version 6.3


Configure and Tuning data collection

When an Agent Builder agent is created, you can configure and tune its data collection to achieve the best results.

How you configure and tune your agent can be different for different Agent Builder agents and even between attribute groups in a single agent. Agent Builder agents can include two types of data and they support two basic methods of data collection for the most common type of data.


Parent topic:

Troubleshooting


Data types

An agent collects two types of data:

  1. Most Tivoli Monitoring attribute groups represent snapshots of data. Someone asks for the data and it is returned. Agents use this type of data to represent configuration, performance, status, and other information where a one time collection of a set of data makes sense. This data is called sampled data.

  2. Some Tivoli Monitoring data represents events. In this case, an event happens and the agent must forward data to Tivoli Monitoring. Examples of events are SNMP Traps, Windows Event Log entries, and new records that are written to a log file. For simplicity, these types of data are grouped and referred to as event data.


Sampled data

When sampled data is required, a request is sent to the agent for a specific attribute group. The request might be initiated by clicking a workspace in the Tivoli Enterprise Portal. Other things that might initiate a request are a situation that is running, a data collection for the Warehouse, or a SOAP request. When the agent receives the request, the agent returns the current data for that attribute group. Tivoli Enterprise Portal requests target a specific attribute group in a particular Managed System Name (MSN). Situations and historical requests are more interesting, especially in an agent which includes subnodes. When a situation needs data for an attribute group in a subnode, the agent receives one request with a list of the targeted subnodes. The agent must respond with all the data for the requested attribute group for all of the subnodes before Tivoli Monitoring can work on the next request.

The most straightforward way for an agent to satisfy a request is to collect data every time it receives a request from Tivoli Monitoring. Agent Builder agents do not collect data every time. Data is not collected every time because it often takes time or uses resources to collect data. And in many cases the same data is requested many times in a short period. For example, a user might define several situations that run at the same interval on an attribute group and the situations can signal several different conditions. Each of these situations results in a request to the agent, but you might prefer each of the situations to see the same data. It is likely that as each situation sees the same data, more consistent results are obtained, minimizing the demand for system resources by the monitoring agent.

The agent developer can configure agents to optimize data collection by choosing to run the collection in one of the following two modes:

  1. On-demand collection: The agent collects data when it receives a request and returns that data.

  2. Scheduled collection: The agent runs data collection in the background on scheduled intervals and returns the most recently collected data when it receives a request.

The agent uses a short-term cache in both of these modes. If another request for data is received while the cache is valid, the agent returns data from the cache without collecting new data for each request. Using data from the cache solves the problem that is caused by multiple concurrent situations (and other types of) requests. The amount of time the data remains valid, the scheduled collection interval, the number of threads that are used for collection and whether the agent runs in on demand or scheduled mode are all defined by environment variables. Using the environment variables, you can tune each agent for the best operation in its environment.

See the following examples that illustrate how the agent works in both modes:


Environment variables

An agent determines which mode to use and how the scheduled data collection runs based on the values of a set of environment variables. These environment variables can be set in the definition of the agent on the Environment Variables panel. Each environment variable is listed in the menu along with the default values. The environment variables can also be set or modified for an installed agent by editing the agent's environment (env) file on Windows or initialization (ini) file on UNIX. The environment variables that control data collections for sampled attribute groups are:

The most important of these variables are CDP_DP_CACHE_TTL, CDP_DP_REFRESH_INTERVAL, and CDP_DP_THREAD_POOL_SIZE.

If CDP_DP_THREAD_POOL_SIZE has a value greater than or equal to 1 or the agent includes subnodes, the agent operates in scheduled collection mode. If CDP_DP_THREAD_POOL_SIZE is not set or is 0, the agent runs in on-demand collection mode.

If the agent is running in scheduled mode, then the agent automatically collects all attribute groups every CDP_DP_REFRESH_INTERVAL seconds. It uses a set of background threads to do the collection. The number of threads is set by using CDP_DP_THREAD_POOL_SIZE. The correct value for the CDP_DP_THREAD_POOL_SIZE varies based on what the agent is doing. For example:

Running an agent with a larger thread pool causes the agent to use more memory (primarily for the stack that is allocated for each thread). It does not however increase the processor usage of the process or increase the actual working set size of the process noticeably. The agent is more efficient with the correct thread pool size for the workload. The thread pool size can be tuned to provide the wanted behavior for a particular agent in a particular environment.

When data is collected, it is placed in the internal cache. This cache is used to satisfy further requests until new data is collected. The validity period for the cache is controlled by CDP_DP_CACHE_TTL. By default the validity period is set to 60 seconds. When an agent is running in scheduled mode, it is best to set the validity period to the same value as CDP_DP_REFRESH_INTERVAL. Set it slightly larger if data collection can take a long time. When set the validity period in this way, the data is considered valid until its next scheduled collection.

The final variable is CDP_DP_IMPATIENT_COLLECTOR_TIMEOUT. This variable comes into play only when CDP_DP_CACHE_TTL expires before new data is collected. When the cache expires before new data is collected, the agent schedules another collection for the data immediately. It then waits for this collection to complete up to CDP_DP_IMPATIENT_COLLECTOR_TIMEOUT seconds. If the new collection completes, the cache is updated and fresh data is returned. If the new collection does not complete, the existing data is returned. The agent does not clear the cache when CDP_DP_CACHE_TTL completes to prevent a problem that is seen with the Universal Agent. The Universal Agent always clears its data cache when the validity period ends. If the Universal Agent clears its data cache before the next collection completes, it has an empty cache for that attribute group and returns no data until the collection completes. Returning no data becomes a problem when situations are running. Any situation that runs after the cache cleared but before the next collection completes sees no data and any of the situations that fire are cleared. The result is floods of events that fire and clear just because data collection is a little slow. The Agent Builder agents do not cause this problem. If the 'old' data causes a situation to fire generally the same data leaves that situation in the same state. After the next collection completes, the situation gets the new data and it either fires or clears based on valid data.


Attribute groups

Agent Builder agents include two attribute groups that you can use to inspect the operation of data collection and to tune the agent for your environment. The attribute groups are Performance Object Status and Thread Pool Status. These attribute groups are described in (Attributes reference). When these attribute groups are used to tune data collection performance, the most useful data is:


Event data

Agent Builder agents can expose several types of event data. Some behavior is common for all event data. The agent receives each new event as a separate row of data. When a row of event data is received, it is sent immediately to Tivoli Monitoring for processing, and added to an internal cache in the agent. Situations and historical collection are performed by Tivoli Monitoring when each row is sent to Tivoli Monitoring. The cache is used to satisfy Tivoli Enterprise Portal or SOAP requests for the data. The agent can use the cache to perform duplicate detection, filtering, and summarization if defined for the attribute group. The size of the event cache for each attribute group is set by CDP_PURE_EVENT_CACHE_SIZE. This cache contains the most recent CDP_PURE_EVENT_CACHE_SIZE events with the most recent event returned first. There are separate caches for each event attribute group. When the cache for an attribute group fills, the oldest event is dropped from the list.

The Agent Builder agent can expose events for:

These events are handled in the most appropriate way for each of the sources. SNMP Traps and Informs, JMX notifications and events from the Java API and socket providers are received asynchronously and forwarded to Tivoli Monitoring immediately. There is no requirement tune these collectors. The agent subscribes to receive Windows Event Log entries from the operating system using the Windows Event Log API. If the agent is using the older Event Logging API, it polls the system for new events using the thread pool settings. For joined attribute groups where one of the data sources is an event data source, there is no tuning to apply to the joined attribute group. Though the joined attribute group does benefit from any tuning applied to the event source group.

File monitoring is more complicated. The agent must monitor the existence of the files and when new records are added to the files. The agent can be configured to monitor files by using patterns for the file name or a static name. As the set of files that matches the patterns can change over time, the agent checks for new or changed files every KUMP_DP_FILE_SWITCH_CHECK_INTERVAL seconds. This global environment variable governs all file monitoring in an agent instance. When the agent determines the appropriate files to monitor, it must determine when the files change. On Windows systems, the agent uses Operating System APIs to listen for these changes. The agent is informed when the files are updated and processes them immediately. On UNIX systems, the agent checks for file changes every KUMP_DP_EVENT seconds. This global environment variable governs all file monitoring in an agent instance. When the agent notices that a file changed, it processes all of the new data in the file and then waits for the next change.


Examples and advanced tuning


Example

Environment variables that are used for more advanced tuning are defined at the agent level. You set the following variables one time and they apply to the all of the attribute groups in the agent:

You can make the following variables apply to individual attribute groups. They still have a global setting that applies to all other attribute groups in the agent:

If you defined an agent to include the following six attribute groups:

You might set the following default variables:

As a result, all of the attribute groups which contain sampled data (SampledDataOne, SampledDataTwo, and SampledDataThree) would be collected every 60 seconds. Each of the event attribute groups (EventDataOne, EventDataTwo, and EventDataThree) would store the last 100 events in their cache.

These settings might work perfectly, or there might be reasons that you must control the settings at a more granular level. For example, what if EventDataOne generally receives 10 times as many events as EventDataTwo and EventDataThree? To further complicate things, there really is a link between EventDataOne and EventDataTwo. When one event is received for EventDataTwo, there are always multiple events for EventDataOne and users want to correlate these events. There is not a single correct setting for the cache size. It would be nice to be able to have EventDataOne store a larger number of events and EventDataTwo store a smaller number. You can achieve this storage by setting CDP_PURE_EVENT_CACHE_SIZE to the size that makes sense for most of the event attribute groups, 100 seems good. Then, you can set CDP_EVENTDATAONE_PURE_EVENT_CACHE_SIZE to 1000. That way all of the corresponding events are visible in the Tivoli Enterprise Portal.

The same thing can be done with CDP_DP_REFRESH_INTERVAL. Set a default value that works for the largest number of attribute groups in the agent. Then set CDP_attribute group name_REFRESH_INTERVAL for the attribute groups which must be collected differently. To optimize collection, set the default CDP_DP_REFRESH_INTERVAL to match the CDP_DP_CACHE_TTL value. CDP_DP_CACHE_TTL is a global value so if set to a value less than a refresh interval, unexpected collections might occur.


+

Search Tips   |   Advanced Search