Bandwidth

Bandwidth

Contents

Overview
Buses
Examples of Buses
Datapaths
Examples of Datapaths
Potential Bandwidth-Related Problems
Potential Bandwidth-related Solutions
Spread the Load
Reduce the Load
Increase the Capacity
In Summary…
Processing Power
Facts About Processing Power
Consumers of Processing Power
Applications
The Operating System
Improving a CPU Shortage
Reducing the Load
Increasing the Capacity
Red Hat Linux-Specific Information
Monitoring Bandwidth on Red Hat Linux
Monitoring CPU Utilization on Red Hat Linux

Overview

Bandwidth and CPU are both based on the hardware that tie directly into a computer's ability to move and process data. As such, their relationship is often interrelated. At its most basic, bandwidth is the capacity for data transfer — in other words, how much data can be moved from one point to another in a given amount of time. Having point-to-point data communication implies two things:

A set of electrical conductors used to make low-level communication possible
A protocol to facilitate the efficient and reliable communication of data

There are two types of system components that meet these requirements:

Buses
Datapaths

Buses

As stated above, buses enable point-to-point communication and use some sort of protocol to ensure that all communication takes place in a controlled manner. However, buses have other distinguishing features:

Standardized electrical characteristics (such as the number of conductors, voltage levels, signaling speeds, etc.)
Standardized mechanical characteristics (such as the type of connector, card size, physical layout, etc.)
Standardized protocol

Buses are the primary way in which different system components are connected together.
In many cases, buses allow the interconnection of hardware that is made by multiple manufacturers; without standardization, this would not be possible. However, even in situations where a bus is proprietary to one manufacturer, standardization is important because it allows that manufacturer to more easily implement different components by using a common interface — the bus itself.

Examples of Buses

Mass storage buses IDE(and SCSI)
Networks [1] (Ethernet and Token Ring)
Memory buses (PC133 and Rambus)
Expansion buses (PCI, ISA, USB)

Datapaths

Datapaths can be harder to identify but, like buses, they are everywhere. Also like datapaths enable point-to-point communication. However, unlike datapaths:

Use a simpler protocol (if any)
Have little (if any) mechanical standardization

The reason for these differences is that datapaths are normally internal to some system component and are not used to facilitate the ad-hoc interconnection of different components. As such, datapaths are highly optimized for a particular situation, where speed and low cost are preferred over slower and more expensive general-purpose flexibility.

Examples of Datapaths

Here are some typical datapaths:

CPU to on-chip cache datapath
Graphics processor to video memory datapath

Potential Bandwidth-Related Problems

There are two ways in which bandwidth-related problems may occur (for either datapaths):

The bus or datapath may represent a shared resource. In this situation, high levels of contention for the bus reduces the effective bandwidth available for all devices on the bus.
A SCSI bus with several highly-active disk drives would be a good example of this. The highly-active disk drives saturate the SCSI bus, leaving little bandwidth available for any other device on the same bus. The end result is that all I/O to any of the devices on this bus will be slow, even if each device on the bus is not overly active.
The bus or datapath may be a dedicated resource with a fixed number of devices attached to it. In this case, the electrical characteristics of the bus (and to some extent the nature of the protocol being used) limit the available bandwidth. This is usually more the case with buses. This is one reason why graphics adapters tend to perform more slowly when operating at higher resolutions and/or color depths — for every screen refresh, there is more data that must be passed along the datapath connecting video memory and the graphics processor.

Potential Bandwidth-related Solutions

Fortunately, bandwidth-related problems can be addressed. In fact, there are several approaches you can take:

Spread the load
Reduce the load
Increase the capacity

Spread the Load

Another approach is to more evenly distribute the bus activity. In other words, if one bus is overloaded and another is idle, perhaps the situation would be improved by moving some of the load to the idle bus.
As a system administrator, this is the first approach you should consider, as often there are additional buses already present in your system. For example, most PCs include at least two IDE channels (which is just another name for a bus). If you have two IDE disk drives and two IDE channels, why should both drives be on the same channel?
Even if your system configuration does not include additional buses, spreading the load might still be a reasonable approach. The hardware expenditures to do so would be less expensive than replacing an existing bus with higher-capacity hardware.

Reduce the Load

At first glance, reducing the load and spreading the load appear to be different sides of the same coin. After all, when one spreads the load, it acts to reduce the load (at least on the overloaded bus), correct?
While this viewpoint is correct, it is not the same as reducing the load globally. The key here is to determine if there is some aspect of the system load that is causing this particular bus to be overloaded. For example, is a network heavily loaded due to activities that are unnecessary? Perhaps a small temporary file is the recipient of heavy read/write I/O. If that temporary file was created on a networked file server, a great deal of network traffic could be eliminated by working with the file locally.

Increase the Capacity

The obvious solution to insufficient bandwidth is to increase it somehow. However, this is usually an expensive proposition. Consider, for example, a SCSI controller and its overloaded bus. In order to increase its bandwidth, the SCSI controller (and likely all devices attached to it) would need to be replaced with faster hardware. If the SCSI controller is a separate card, this would be a relatively straightforward process, but if the SCSI controller is part of the system's motherboard, it becomes much more difficult to justify the economics of such a change.
In Summary…
Sometimes, the problem is not the bus itself, but one of the components attached to the bus. For example, consider a SCSI adapter that is connected to a PCI bus. If there are performance problems with SCSI disk I/O, it might be the result of a poorly-performing SCSI adapter, even though the SCSI and PCI buses themselves are nowhere near their bandwidth capabilities.
Instead of an intra-system bus, networks can be thought of as an inter-system bus.
Processing Power

Often known as CPU power, CPU cycles, and various other names, processing power is the ability of a computer to manipulate data. Processing power varies with the architecture (and clock speed) of the CPU — usually CPUs with higher clock speeds and those supporting larger word sizes have more processing power than slower CPUs supporting smaller word sizes.
Facts About Processing Power

Here are the two main facts about processing power that you should keep in mind:

Processing power is fixed
Processing power cannot be stored

Processing power is fixed, in that the CPU can only go so fast. For example, if you need to add two numbers together (an operation that takes only one machine instruction on most architectures), a particular CPU can do it at one speed, and one speed only. With few exceptions, it is not even possible to slow the rate at which a CPU processes instructions.
Processing power is also fixed in another way: it is finite. That is, there are limits to the CPU performance you can put into any given computer. Some systems are capable of supporting a wide range of CPU speeds, while others may not be upgradeable at all [1].
Processing power cannot be stored for later use. In other words, if a CPU can process 100 million instructions in one second, one second of idle time equals 100 million instructions that have been wasted.
If we take these facts and look at them from a slightly different perspective, a CPU "produces" a stream of executed instructions at a fixed rate. And if the CPU "produces" executed instructions, that means that something else must "consume" them. The next section describes what these consumers are.
Consumers of Processing Power

There are two main consumers of processing power:

Applications
The operating system itself

Applications

The most obvious consumers of processing power are the applications and programs you want the computer to run for you. From a spreadsheet to a database, these are the reasons you have a computer.
A single-CPU system can only do one thing at any given time. Therefore, if your application is running, everything else on the system is not. And the opposite is, of course, true — if something other than your application is running, then your application is doing nothing.
But how is it that many different applications can seemingly run at once under Red Hat Linux? The answer is that Red Hat Linux is a multitasking operating system. In other words, it creates the illusion that many different things are going on simultaneously when in fact that is not possible. The trick is to give each process a fraction of a second's worth of time running on the CPU before giving the CPU to another process for another fraction of a second. If these context switches happen quickly enough, the illusion of multiple applications running simultaneously is achieved.
Of course, applications do other things than manipulate data using the CPU. They may wait for user input as well as performing I/O to devices such as disk drives and graphics displays. When these events take place, the application does not need the CPU. At these times, the CPU can be used for other processes running other applications without slowing the waiting application at all.
In addition, the CPU can be used by another consumer of processing power: the operating system itself.

The Operating System

It is difficult to determine how much processing power is consumed by the operating system. The reason for this is that operating systems use a mixture of process-level and system-level code to perform their work. While, for example, it is easy to use top to see what the process running the system logging daemon syslogd is doing, it is not so easy to see how much processing power is being consumed by system-level I/O-related processing.
In general, it is possible to divide this kind of operating system overhead into two types:

Operating system housekeeping
Process-related activities

Operating system housekeeping includes activities such as process scheduling and memory management, while process-related activities include any processes that support the operating system itself (including system daemons such as syslogd, klogd, etc.).
Improving a CPU Shortage

When there is insufficient processing power available for the work that needs to be done, you have two options:

Reducing the load
Increasing the capacity

Reducing the Load

Reducing the CPU load is something that can be done with no expenditure of money. The trick is to identify those aspects of the system load that are under your control and can be cut back. There are three areas to focus on:

Reducing operating system overhead
Reducing application overhead
Eliminating applications entirely

Reducing Operating System Overhead

In order to reduce operating system overhead, look at your current system load and determine what aspects of it result in inordinate amounts of overhead. These areas could include:

Reducing the need for frequent process scheduling
Lowering the amount of I/O performed

Do not expect miracles; in a reasonably-well configured system, it is unlikely that you will see much of a performance increase by trying to reduce operating system overhead. This is due to the fact that a reasonably-well configured system will, by definition, result in a minimal amount of overhead. However, if your system is running with too little RAM for instance, you may be able to reduce overhead by alleviating the RAM shortage.

Reducing Application Overhead

Reducing application overhead means making sure that the application has everything it needs to run well. Some applications exhibit wildly different behaviors under different environments — an application may become highly compute-bound while processing certain types of data, but not others.
The point to keep in mind here is that understand the applications running on your system if you are to enable them to run as efficiently as possible. Often this entails working with your users, and/or your organization's developers, to help uncover ways in which the applications can be made to run more efficiently.

Eliminating Applications Entirely

Depending on your organization, this approach might not be available to you, as it often is not a system administrator's responsibility to dictate which applications will and will not be run. However, if you can identify any applications that are known "CPU hogs", you might be able to influence the powers-that-be to retire them.
Doing this will likely involve more than just yourself. The affected users should certainly be a part of this process; in many cases they may have the knowledge and the political power to make the necessary changes to the application lineup.
Keep in mind that an application may not need to be eliminated from every system in your organization. You might be able to move a particularly CPU-hungry application from an overloaded system to another system that is nearly idle.

Increasing the Capacity

Of course, if it is not possible to reduce the demand for processing power, find ways of increasing the processing power that is available. To do so costs money, but it can be done.

Upgrading the CPU

The most straightforward approach is to determine if your system's CPU can be upgraded. The first step is to see if the current CPU can be removed. Some systems (primarily laptops) have CPUs that are soldered in place, making an upgrade impossible. The rest, however, have socketed CPUs, making upgrades possible — at least in theory.
Next, do some research to determine if a faster CPU exists for your system configuration. For example, if you currently have a 1GHz CPU, and a 2GHz unit of the same type exists, an upgrade might be possible.
Finally, determine the maximum clock speed supported by your system. To continue the example above, even if a 2GHz CPU of the proper type exists, a simple CPU swap is not an option if your system only supports processors running at 1GHz or below.
Should you find that you cannot install a faster CPU in your system, your options may be limited to changing motherboards or even the more expensive forklift upgrade mentioned earlier.
However, some system configurations make a slightly different approach possible. Instead of replacing the current CPU, why not just add another one?

Symmetric Multiprocessing

Symmetric multiprocessing makes it possible for a computer system to have more than one CPU sharing all system resources. This means that, unlike a uniprocessor system, an SMP system may actually have more than one process running at the same time.
At first glance, this seems like any system administrator's dream. First and foremost, SMP makes it possible to increase a system's CPU power even if CPUs with faster clock speeds are not available — just by adding another CPU. However, this flexibility comes with some caveats.
The first caveat is that not all systems are capable of SMP operation. Your system must have a motherboard designed to support multiple processors. If it does not, a motherboard upgrade (at the least) would be required.
The second caveat is that SMP increases system overhead. This makes sense if you stop to think about it; with more CPUs to schedule work for, the operating system requires more CPU cycles for overhead. Another aspect to this is that with multiple CPUs there can be more contention for system resources. Because of these factors, upgrading a dual-processor system to a quad-processor unit does not result in a 100% increase in available CPU power. In fact, depending on the actual hardware, the workload, and the processor architecture, it is possible to reach a point where the addition of another processor could actually reduce system performance.
Another point to keep in mind is that SMP does not help workloads that consist of one monolithic application with a single stream of execution. In other words, if a large compute-bound simulation program runs as one process and with no threads, it will not run any faster on an SMP system than on a single-processor machine. In fact, it may even run somewhat slower, due to the increased overhead SMP brings. For these reasons, many system administrators feel that when it comes to CPU power, single stream processing power is the way to go. It provides the most CPU power with the fewest restrictions on its use.
While this discussion seems to indicate that SMP is never a good idea, there are circumstances in which it makes sense. For example, environments running multiple highly compute-bound applications are good candidates for SMP. The reason for this is that applications that do nothing but compute for long periods of time keep contention between active processes (and therefore, the operating system overhead) to a minimum, while the processes themselves keep every CPU busy.
One other thing to keep in mind about SMP is that the performance of an SMP system tends to degrade more gracefully as the system load increases. This does make SMP systems popular in server and multi-user environments, as the ever-changing process mix impacts the system-wide load less on a multi-processor machine.
This situation leads to what is humorously termed as a forklift upgrade, which means a complete replacement of a computer.

Red Hat Linux-Specific Information

Monitoring Bandwidth on Red Hat Linux

It is difficult to directly monitor bandwidth utilization. However, by looking at device-level statistics, it is possible to roughly gauge whether insufficient bandwidth is an issue on your system.
By using vmstat, it is possible to determine if overall device activity is excessive by viewing the bi and bo fields; in addition, taking note of the si and so fields give you a bit more insight into how much disk activity is due to swap-related I/O:
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 1  0  0      0 248088 158636 480804   0   0     2     6  120   120  10   3  87
        
In this example, the bi field shows two blocks/second written to block devices (primarily disk drives), while the bo field shows six blocks/second read from block devices. We can see that none of this activity was due to swapping, as the si and so fields both show a swap-related I/O rate of zero kilobytes/second.
By using iostat, it is possible to gain a bit more insight into disk-related activity:
Linux 2.4.18-18.8.0smp (raptor.example.com)     12/15/2002

avg-cpu:  %user   %nice    %sys   %idle
           5.34    4.60    2.83   87.24

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
dev8-0            1.10         6.21        25.08     961342    3881610
dev8-1            0.00         0.00         0.00         16          0
        
This output shows us that the device with major number 8 (which is /dev/sda, the first SCSI disk) averaged slightly more than one I/O operation per second (the tsp field). Most of the I/O activity for this device were writes (the Blk_wrtn field), with slightly more than 25 blocks written each second (the Blk_wrtn/s field).
If more detail is required, use iostat's -x option:
Linux 2.4.18-18.8.0smp (raptor.example.com)     12/15/2002

avg-cpu:  %user   %nice    %sys   %idle
           5.37    4.54    2.81   87.27

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz
/dev/sda    13.57   2.86  0.36  0.77   32.20   29.05    16.10    14.53    54.52
/dev/sda1    0.17   0.00  0.00  0.00    0.34    0.00     0.17     0.00   133.40
/dev/sda2    0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00    11.56
/dev/sda3    0.31   2.11  0.29  0.62    4.74   21.80     2.37    10.90    29.42
/dev/sda4    0.09   0.75  0.04  0.15    1.06    7.24     0.53     3.62    43.01
        
Over and above the longer lines containing more fields, the first thing to keep in mind is that iostat is now displaying statistics on a per-partition level. By using df to associate mount points with device names, it is possible to use this report to determine if, for example, /home is experiencing an excessive workload.
Actually, each line output from iostat -x is longer and contains more information than this; here is the remainder of each line (with the device column added for easier reading):
Device:    avgqu-sz   await  svctm  %util
/dev/sda       0.24   20.86   3.80   0.43
/dev/sda1      0.00  141.18 122.73   0.03
/dev/sda2      0.00    6.00   6.00   0.00
/dev/sda3      0.12   12.84   2.68   0.24
/dev/sda4      0.11   57.47   8.94   0.17
        
In this example, it is interesting to note that /dev/sda2 is the system swap partition; it is obvious from the many fields reading 0.00 for this partition that swapping is not a problem on this system.
Another interesting point to note is /dev/sda1. The statistics for this partition are unusual; the overall activity seems low, but why are the average I/O request size (the avgrq-sz field), average wait time (the await field), and the average service time (the svctm field) so much larger than the other partitions? The answer is that this partition contains the /boot/ directory, which is where the kernel and initial ramdisk are stored. When the system boots, the read I/Os (notice that only the rsec/s and rkB/s fields are non-zero; no writing is done here on a regular basis) used during the boot process are for large numbers of blocks, resulting in the relatively long wait and service times iostat displays.
It is possible to use sar for a longer-term view of I/O statistics; for example, sar -b displays a general I/O report:
Linux 2.4.18-18.8.0smp (raptor.example.com)     12/15/2002

12:00:00 AM       tps      rtps      wtps   bread/s   bwrtn/s
12:10:00 AM      0.51      0.01      0.50      0.25     14.32
12:20:01 AM      0.48      0.00      0.48      0.00     13.32
…
06:00:02 PM      1.24      0.00      1.24      0.01     36.23
Average:         1.11      0.31      0.80     68.14     34.79
        
Here, like iostat's initial display, the statistics are grouped for all block devices.
Another I/O-related report is produced using sar -d:
Linux 2.4.18-18.8.0smp (raptor.example.com)     12/15/2002

12:00:00 AM       DEV       tps    sect/s
12:10:00 AM    dev8-0      0.51     14.57
12:10:00 AM    dev8-1      0.00      0.00
12:20:01 AM    dev8-0      0.48     13.32
12:20:01 AM    dev8-1      0.00      0.00
…
06:00:02 PM    dev8-0      1.24     36.25
06:00:02 PM    dev8-1      0.00      0.00
Average:       dev8-0      1.11    102.93
Average:       dev8-1      0.00      0.00
        
This report provides a per-device view, but with little detail.
As we can see, while there are no explicit statistics that show bandwidth utilization for a given bus or datapath, we can at least see what the devices are doing and use their activity to indirectly determine the bus loading.
Monitoring CPU Utilization on Red Hat Linux

Unlike bandwidth, monitoring CPU utilization is much more straightforward. From a single percentage of CPU utilization in GNOME System Monitor, to the more in-depth statistics reported by sar, it is possible to accurately determine how much CPU power is being consumed and by what.
Moving beyond GNOME System Monitor, top is the first resource monitoring tool. Here is a top report from a dual-processor workstation:
  9:44pm  up 2 days, 2 min,  1 user,  load average: 0.14, 0.12, 0.09
90 processes: 82 sleeping, 1 running, 7 zombie, 0 stopped
CPU0 states:  0.4% user,  1.1% system,  0.0% nice, 97.4% idle
CPU1 states:  0.5% user,  1.3% system,  0.0% nice, 97.1% idle
Mem:  1288720K av, 1056260K used,  232460K free,       0K shrd,  145644K buff
Swap:  522104K av,       0K used,  522104K free                  469764K cached

  PID USER     PRI  NI  size  RSS SHARE STAT %CPU %MEM   TIME COMMAND
30997 ed        16   0  1100 1100   840 R     1.7  0.0   0:00 top
 1120 root       5 -10  249M 174M 71508 S <   0.9 13.8 254:59 X
 1260 ed        15   0 54408  53M  6864 S     0.7  4.2  12:09 gnome-terminal
  888 root      15   0  2428 2428  1796 S     0.1  0.1   0:06 sendmail
 1264 ed        15   0 16336  15M  9480 S     0.1  1.2   1:58 rhn-applet-gui
    1 root      15   0   476  476   424 S     0.0  0.0   0:05 init
    2 root      0K   0     0    0     0 SW    0.0  0.0   0:00 migration_CPU0
    3 root      0K   0     0    0     0 SW    0.0  0.0   0:00 migration_CPU1
    4 root      15   0     0    0     0 SW    0.0  0.0   0:01 keventd
    5 root      34  19     0    0     0 SWN   0.0  0.0   0:00 ksoftirqd_CPU0
    6 root      34  19     0    0     0 SWN   0.0  0.0   0:00 ksoftirqd_CPU1
    7 root      15   0     0    0     0 SW    0.0  0.0   0:05 kswapd
    8 root      15   0     0    0     0 SW    0.0  0.0   0:00 bdflush
    9 root      15   0     0    0     0 SW    0.0  0.0   0:01 kupdated
   10 root      25   0     0    0     0 SW    0.0  0.0   0:00 mdrecoveryd
          
The first CPU-related information is present on the very first line: the load average. The load average is a number that corresponds to the average number of runnable processes on the system. The load average is often listed as three sets of numbers (as seen here), which represent the load average for the past 1, 5, and 15 minutes, indicating that the system in this example was not very busy.
The next line, although not strictly related to CPU utilization, has an indirect relationship, in that it shows the number of runnable processes (here, only one -- remember this number, as it means something special in this example). The number of runnable processes is a good indicator of how CPU-bound a system might be.
Next are two lines that display the current utilization for each of the two CPUs in the system. The utilization statistics are broken down to show whether the CPU cycles expended were done so for user-level or system-level processing; also included is a statistic showing how much CPU time was expended by processes with altered scheduling priorities. Finally, there is an idle time statistic.
Moving down into the process-related section of the display, we find that the process using the most CPU power is top itself; in other words, the one runnable process on this otherwise-idle system was top taking a "picture" of itself.

It is important to remember that the very act of running a system monitor affects the resource utilization statistics you receive. All software-based monitors do this to some extent.
In order to get a more detailed view of CPU utilization, we must change tools. If we look at output from vmstat, we obtain a slightly different view of our example system:
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 1  0  0      0 233276 146636 469808   0   0     7     7   14    27  10   3  87
 0  0  0      0 233276 146636 469808   0   0     0     0  523   138   3   0  96
 0  0  0      0 233276 146636 469808   0   0     0     0  557   385   2   1  97
 0  0  0      0 233276 146636 469808   0   0     0     0  544   343   2   0  97
 0  0  0      0 233276 146636 469808   0   0     0     0  517    89   2   0  98
 0  0  0      0 233276 146636 469808   0   0     0    32  518   102   2   0  98
 0  0  0      0 233276 146636 469808   0   0     0     0  516    91   2   1  98
 0  0  0      0 233276 146636 469808   0   0     0     0  516    72   2   0  98
 0  0  0      0 233276 146636 469808   0   0     0     0  516    88   2   0  97
 0  0  0      0 233276 146636 469808   0   0     0     0  516    81   2   0  97
        
Here we have used the command vmstat 1 10 to sample the system every second for ten times. At first, the CPU-related statistics (the us, sy, and id fields) seem similar to what top displayed, and maybe even a bit less detailed. However, unlike top, we can also gain a bit of insight into how the CPU is being used.
If we look at the system fields, we see that the CPU is handling about 500 interrupts per second on average, and is switching between processes anywhere from 80 to nearly 400 times a second. If you think this seems like a lot of activity, think again, because the user-level processing (the us field) is only averaging 2%, while system-level processing (the sy field) is usually under 1%. Again, this is an idle system.
Looking at the tools that Sysstat offers, we find that iostat and mpstat provide little additional information over what we have already seen with top and vmstat. However, sar produces a number of reports that can come in handy when monitoring CPU utilization.
The first report is obtained by the command sar -q, and displays the run queue length, total number of processes, and the load averages for the past one and five minutes. Here is a sample:
Linux 2.4.18-14smp (falcon.example.com)      12/16/2002

12:00:01 AM   runq-sz  plist-sz   ldavg-1   ldavg-5
12:10:00 AM         3       122      0.07      0.28
12:20:01 AM         5       123      0.00      0.03
…
09:50:00 AM         5       124      0.67      0.65
Average:            4       123      0.26      0.26
        
In this example, the system is always busy (given that more than one process is runnable at any given time), but is not overly loaded (due to the fact that this particular system has more than one processor).
The next CPU-related sar report is produced by the command sar -u:
Linux 2.4.18-14smp (falcon.example.com)      12/16/2002

12:00:01 AM       CPU     %user     %nice   %system     %idle
12:10:00 AM       all      3.69     20.10      1.06     75.15
12:20:01 AM       all      1.73      0.22      0.80     97.25
…
10:00:00 AM       all     35.17      0.83      1.06     62.93
Average:          all      7.47      4.85      3.87     83.81
        
The statistics contained in this report are no different from those produced by many of the other tools. The biggest benefit here is that sar makes the data available on an ongoing basis and is therefore more useful for obtaining long-term averages, or for the production of CPU utilization graphs.
On multiprocessor systems, the sar -U command can produce statistics for an individual processor or for all processors. Here is an example of output from sar -U ALL:
Linux 2.4.18-14smp (falcon.example.com)      12/16/2002

12:00:01 AM       CPU     %user     %nice   %system     %idle
12:10:00 AM         0      3.46     21.47      1.09     73.98
12:10:00 AM         1      3.91     18.73      1.03     76.33
12:20:01 AM         0      1.63      0.25      0.78     97.34
12:20:01 AM         1      1.82      0.20      0.81     97.17
…
10:00:00 AM         0     39.12      0.75      1.04     59.09
10:00:00 AM         1     31.22      0.92      1.09     66.77
Average:            0      7.61      4.91      3.86     83.61
Average:            1      7.33      4.78      3.88     84.02
        
The sar -w command reports on the number of context switches per second, making it possible to gain additional insight in where CPU cycles are being spent:
Linux 2.4.18-14smp (falcon.example.com)      12/16/2002

12:00:01 AM   cswch/s
12:10:00 AM    537.97
12:20:01 AM    339.43
…
10:10:00 AM    319.42
Average:      1158.25
        
It is also possible to produce two different sar reports on interrupt activity. The first, (produced using the sar -I SUM command) displays a single "interrupts per second" statistic:
Linux 2.4.18-14smp (falcon.example.com)      12/16/2002

12:00:01 AM      INTR    intr/s
12:10:00 AM       sum    539.15
12:20:01 AM       sum    539.49
…
10:40:01 AM       sum    539.10
Average:          sum    541.00
        
By using the command sar -I PROC, it is possible to break down interrupt activity by processor (on multiprocessor systems) and by interrupt level (from 0 to 15):
Linux 2.4.18-18.8.0 (pigdog.example.com)     12/16/2002

12:00:00 AM  CPU  i000/s  i001/s  i002/s  i008/s  i009/s  i011/s  i012/s
12:10:01 AM    0  512.01    0.00    0.00    0.00    3.44    0.00    0.00

12:10:01 AM  CPU  i000/s  i001/s  i002/s  i008/s  i009/s  i011/s  i012/s
12:20:01 AM    0  512.00    0.00    0.00    0.00    3.73    0.00    0.00
…
10:30:01 AM  CPU  i000/s  i001/s  i002/s  i003/s  i008/s  i009/s  i010/s
10:40:02 AM    0  512.00    1.67    0.00    0.00    0.00   15.08    0.00
Average:       0  512.00    0.42    0.00     N/A    0.00    6.03     N/A
        
This report (which has been truncated horizontally to fit on the page) includes one column for each interrupt level (the i002/s field illustrating the rate for interrupt level 2). If this were a multiprocessor system, there would be one line per sample period for each CPU.
Another important point to note about this report is that sar adds or removes specific interrupt fields if no data is collected for that field. This can be seen in the example report above; the end of the report includes interrupt levels (3 and 10) that were not present at the start of the sampling period.
There are two other interrupt-related sar reports — sar -I ALL and sar -I XALL. However, the default configuration for the sadc data collection utility does not collect the information necessary for these reports. This can be changed by editing the file /etc/cron.d/sysstat, and changing this line:
*/10 * * * * root /usr/lib/sa/sa1 1 1 
to look like this:
*/10 * * * * root /usr/lib/sa/sa1 -I 1 1 
Keep in mind that this change does cause additional information to be collected by sadc, and results in larger data file sizes. Therefore, make sure your system configuration can support the additional space consumption.