Tuning for Oracle9i on UNIX

Tuning for Oracle9i on UNIX

Overview

Oracle9i is a highly-optimizable software product. Frequent tuning optimizes system performance and prevents data bottlenecks. Although this chapter is written from the perspective of single-node computer systems, most of the performance tuning tips provided here are also valid when using Oracle9i Real Application Clusters and features available with Oracle9i.
Before tuning the system, observe its normal behavior using the tools described in "Operating System Tools".

Types of Performance Bottlenecks

Performance bottlenecks are often caused by the following:

Memory contention

Memory contention occurs when processes require more memory than is available. When this occurs, the system pages and swaps processes between memory and disk.
Disk I/O contention

Disk I/O contention is caused by poor memory management, poor distribution of tablespaces and files across disks, or a combination of both.
CPU contention

Although the UNIX kernel usually allocates CPU resources effectively, many processes compete for CPU cycles and this can cause contention. If you installed Oracle9i in a multiprocessor environment, there might be a different level of contention on each CPU.
Oracle resources contention

Contention is also common for Oracle resources such as locks and latches.

Operating System Tools

Several operating system tools are available to help you assess database performance and determine database requirements. In addition to providing statistics for Oracle processes, these tools provide statistics for CPU usage, interrupts, swapping, paging, context switching, and I/O for the entire system.

vmstat

Use the vmstat command to view process, virtual memory, disk, trap, and CPU activity, depending on the switches you supply with the command. Enter one of the following commands to display a summary of CPU activity eight times, at five-second intervals:
HP and Solaris:
$ vmstat -S 5 8
AIX, Linux, and Tru64:
$ vmstat 5 8
The following example shows sample output from the command on Solaris:
procs     memory            page            disk          faults      cpu
 r b w   swap  free  si  so pi po fr de sr f0 s0 s1 s3   in   sy   cs us sy id
 0 0 0   1892  5864   0   0  0  0  0  0  0  0  0  0  0   90   74   24  0  0 99
 0 0 0  85356  8372   0   0  0  0  0  0  0  0  0  0  0   46   25   21  0  0 100
 0 0 0  85356  8372   0   0  0  0  0  0  0  0  0  0  0   47   20   18  0  0 100
 0 0 0  85356  8372   0   0  0  0  0  0  0  0  0  0  2   53   22   20  0  0 100
 0 0 0  85356  8372   0   0  0  0  0  0  0  0  0  0  0   87   23   21  0  0 100
 0 0 0  85356  8372   0   0  0  0  0  0  0  0  0  0  0   48   41   23  0  0 100
 0 0 0  85356  8372   0   0  0  0  0  0  0  0  0  0  0   44   20   18  0  0 100
 0 0 0  85356  8372   0   0  0  0  0  0  0  0  0  0  0   51   71   24  0  0 100
The w column, under the procs column, shows the number of potential processes that have been swapped out and written to disk. If the value is not zero, swapping is occurring and your system is short of memory. The si and so columns indicate the number of swap-ins and swap-outs per second, respectively. Swap-ins and swap-outs should always be zero.

The output from the vmstat command differs between platforms. See the man page on your platform for information on interpreting the output.

sar

Use the sar command to display cumulative activity counters in the operating system, depending on the switches that you supply with the command. The following commands display a summary of I/O activity ten times, at ten-second intervals:
$ sar -b 10 10
The following example shows sample output from the command on Solaris:
13:32:45 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s
13:32:55       0      14     100       3      10      69       0       0
13:33:05       0      12     100       4       4       5       0       0
13:33:15       0       1     100       0       0       0       0       0
13:33:25       0       1     100       0       0       0       0       0
13:33:35       0      17     100       5       6       7       0       0
13:33:45       0       1     100       0       0       0       0       0
13:33:55       0       9     100       2       8      80       0       0
13:34:05       0      10     100       4       4       5       0       0
13:34:15       0       7     100       2       2       0       0       0
13:34:25       0       0     100       0       0     100       0       0

Average        0       7     100       2       4      41       0       0
On Tru64 systems, the sar command is available in the UNIX SVID2 compatibility subset, OSFSVID2400.

iostat

Use the iostat command to report terminal and disk activity, depending on the switches you supply with the command. The output from the iostat command does not include disk request queues, but it shows which disks are busy. This information is valuable when balance I/O loads.
The following command displays terminal and disk activity five times, at five-second intervals:
$ iostat 5 5
The following example shows sample output from the command on Solaris:
tty          fd0           sd0           sd1           sd3          cpu
 tin tout Kps tps serv  Kps tps serv  Kps tps serv  Kps tps serv  us sy wt id
   0    1   0   0    0    0   0   31    0   0   18    3   0   42   0  0  0 99
   0   16   0   0    0    0   0    0    0   0    0    1   0   14   0  0  0 100
   0   16   0   0    0    0   0    0    0   0    0    0   0    0   0  0  0 100
   0   16   0   0    0    0   0    0    0   0    0    0   0    0   0  0  0 100
   0   16   0   0    0    0   0    0    2   0   14   12   2   47   0  0  1 98
Use the iostat command to look for large disk request queues. A request queue shows how long the I/O requests on a particular disk device must wait to be serviced. Request queues are caused by a high volume of I/O requests to that disk or by I/O with long average seek times. Ideally, disk request queues should be at or near zero.

swap, swapinfo, swapon, and lsps

Use the swap, swapinfo, swapon or lsps command to report information on swap space usage. A shortage of swap space can cause slow response times or even cause the system to stop responding. The following table lists the appropriate command to use for your platform:

Platform Command
AIX lsps -a
HP swapinfo -m
Linux swapon -s
Solaris swap -l
Tru64 swapon -s

The following example shows sample output from the swap -l command on Solaris:
swapfile        dev  swaplo blocks   free
/dev/dsk/c0t3d0s1    32,25      8 197592 162136
Linux Tools

On Linux systems, use the free command to view information on swap space, memory, and buffer usage. A shortage of swap space can result in the system hanging and slow response time.

Solaris Tools

On Solaris systems, use the mpstat command to view statistics for each processor in a multiprocessor system . Each row of the table represents the activity of one processor. The first row summarizes all activity since the last system reboot; each subsequent row summarizes activity for the preceding interval. All values are events per second unless otherwise noted. The arguments are for time intervals between statistics and number of iterations. The following example shows sample output from the mpstat command:
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0    1    71   21   23    0    0    0    0    55    0   0   0  99
  2    0   0    1    71   21   22    0    0    0    0    54    0   0   0  99
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0    0    61   16   25    0    0    0    0    57    0   0   0 100
  2    1   0    0    72   16   24    0    0    0    0    59    0   0   0 100
AIX System Management Interface Tool

The AIX System Management Interface Tool (SMIT) provides a menu-driven interface to various system administrative and performance tools. Using SMIT, you can navigate through large numbers of tools and focus on the jobs you want to execute.

Base Operation System Tools

The AIX Base Operation System (BOS) contains performance tools that are historically part of UNIX systems or are required to manage the implementation-specific features of AIX. The following table lists the most important BOS tools:

Tool Description
lsattr Displays the attributes of devices
lslv Displays information about a logical volume or the logical volume allocations of a physical volume
netstat Displays the contents of network-related data structures
nfsstat Displays statistics about Network File System (NFS) and Remote Procedure Call (RPC) activity
nice Changes the initial priority of a process
no Displays or sets network options
ps Displays the status of one or more processes
reorgvg Reorganizes the physical-partition allocation within a volume group
time Prints the elapsed execution, user CPU processing, and system CPU processing time
trace Records and reports selected system events

AIX Performance Toolbox

The AIX Performance Toolbox (PTX) contains tools for monitoring and tuning system activity locally and remotely. PTX consists of two main components, the PTX Manager and the PTX Agent. The PTX Manager collects and displays data from various systems in the configuration by using the xmperf utility. The PTX Agent collects and transmits data to the PTX Manager by using the xmserd utility. The PTX Agent is also available as a separate product called Performance Aide for AIX.
Both PTX and Performance Aide include the following separate monitoring and tuning tools:

Tool Description
fdpr Optimizes an executable program for a particular workload
filemon Uses the trace facility to monitor and report the activity of the file system
fileplace Displays the placement of a file's blocks within logical or physical volumes
lockstat Displays statistics about contention for kernel locks
lvedit Facilitates interactive placement of logical volumes within a volume group
netpmon Uses the trace facility to report on network I/O and network-related CPU usage
rmss Simulates systems with various sizes of memory for performance testing
svmon Captures and analyzes information about virtual-memory usage
syscalls Records and counts system calls
tprof Uses the trace facility to report CPU usage at module and source-code-statement levels
BigFoot Reports the memory access patterns of processes
stem Permits subroutine-level entry and exit instrumentation of existing executables

Performance Tuning Tools

The following table lists the tools that you can use for additional performance tuning on HP 9000 Series HP-UX systems:

Tools Description
gprof Creates an execution profile for programs
monitor Monitors the program counter and calls to certain functions
netfmt Monitors the network
netstat Reports statistics on network performance
nfsstat Reports statistics for each processor
nettl Captures network events or packets by logging and tracing
prof Creates an execution profile of C programs and displays performance statistics for your program, showing where your program is spending most of its execution time
profil Copies program counter information into a buffer
top Displays the top processes on the system and periodically updates the information

HP Performance Analysis Tools

The following HP-UX performance analysis tools are available:

GlancePlus/UX

HP PAK

GlancePlus/UX

This HP-UX utility is an online diagnostic tool that measures the system's activities. GlancePlus displays how system resources are being used. It displays dynamic information about the system's I/O, CPU, and memory usage in a series of screens. You can also use the utility to monitor how individual processes are using resources.

HP PAK

HP Programmer's Analysis Kit (HP PAK) currently consists of two tools, Puma and Thread Trace Visualizer (TTV):

Puma collects performance statistics during a program run. It provides several graphical displays for viewing and analyzing the collected statistics.

TTV displays trace files produced by the instrumented thread library, libpthread_tr.sl, in a graphical format. It allows you to view how threads are interacting and to find where threads are blocked waiting for resources.

HP PAK is bundled with the HP FORTRAN 77, HP Fortran90, HP C, HP C++, HP ANSI C++, and HP Pascal compilers.

Oracle SQL Tuning Tools

Oracle provides a variety of tools for tuning SQL, including the V$ performance views, the EXPLAIN PLAN command, the SQL TRACE facility, the TKPROF facility, and the Autotrace report.

Tuning Memory Management

Start the memory tuning process by measuring paging and swapping space to determine how much memory is available. After you have determined your system's memory usage, tune the Oracle buffer cache.
The Oracle buffer manager ensures that the more frequently accessed data is cached longer. If you monitor the buffer manager and tune the buffer cache, you can have a significant influence on Oracle9i performance. The optimal Oracle9i buffer size for your system depends on the overall system load and the relative priority of Oracle over other applications.

Allocate Sufficient Swap Space

Try to minimize swapping because it causes significant UNIX overhead. To check for swapping, enter one of the following commands:

On AIX, Linux, Solaris, and Tru64, use the sar or vmstat commands. See the man pages for information on the appropriate options to use with these commands.

On HP, use the swapinfo -m command.

If your system is swapping and conserve memory:

Avoid running unnecessary system daemon processes or application processes.

Decrease the number of database buffers to free some memory.

Decrease the number of UNIX file buffers, especially if you are using raw devices.

To determine how much swap space is in use, enter one of the following commands:

Platform Command
AIX lsps -a
HP swapinfo -m
Linux swapon -s
Solaris swap -l
Tru64 swapon -s

To add swap space to your system, enter one of the following commands:

Platform Command
AIX chps or mkps
HP swapon
Linux swapon -a
Solaris swap -a
Tru64 swapon -a

Set the swap space to between two and four times the system's physical memory. Monitor the use of swap space and increase it as required.

Control Paging

Paging might not present as serious a problem as swapping, because an entire program does not have to be stored in memory to run. A small number of page-outs might not noticeably affect the performance of your system.
To detect excessive paging, run measurements during periods of fast response or idle time to compare against measurements from periods of slow response.
Use the vmstat or sar command to monitor paging. See the man pages or your operating system documentation for information on interpreting the results for your platform. The following columns from the output of this command are important on Solaris:

Column Description
vflt/s Indicates the number of address translation page faults. Address translation faults occur when a process references a valid page not in memory.
rclm/s Indicates the number of valid pages that have been reclaimed and added to the free list by page-out activity. This value should be zero.

If your system consistently has excessive page-out activity, consider the following solutions:

Install more memory.

Move some of the work to another system.

Configure your kernel to use less memory.

Adjust Oracle Block Size

A UNIX system reads entire operating system blocks from the disk. If the database block size is smaller than the UNIX file system buffer size, I/O bandwidth is inefficient. If you adjust the Oracle database block size to be a multiple of the operating system block size, you can increase performance by up to five percent.
The DB_BLOCK_SIZE initialization parameter sets the database block size. You can change the block size by recreating the database.
To see the current value of the DB_BLOCK_SIZE parameter, enter the SHOW PARAMETERS command in SQL*Plus.

Tuning Disk I/O

Balance I/O evenly across all available disks to reduce disk access times. For smaller databases and those not using RAID, ensure that different datafiles and tablespaces are distributed across the available disks.

Choose the Appropriate File System Type

Depending on the operating system that you use, you can choose from a range of file systems. File systems have different characteristics, and the techniques they use to access data can have a substantial impact on database performance. The following table lists typical file system choices and the platforms on which they are available:

File System Platform Description
S5 AIX, HP, Solaris UNIX System V file system
UFS AIX, HP, Solaris, Tru64 Unified file system, derived from BSD UNIX
VXFS AIX, Solaris Veritas file system
Raw AIX, HP, Linux, Solaris, Tru64 Raw devices/volumes (No file system)
EXT2 Linux Extended file system for Linux
AdvFS Tru64 Poly advanced file system
CFS Tru64 Cluster file system
JFS AIX Journaled file system

The suitability of a file system to an application is usually undocumented. For example, even different implementations of the Unified file system are hard to compare. Performance differences may vary from 0 to 20 percent, depending on the file system you choose.
If you choose to use a file system:

Make a new file system to ensure that the hard disk is clean and unfragmented.

Perform a file system check on the partition before using it for database files.

Distribute disk I/O as evenly as possible.

Separate log files from database files.

Monitoring Disk Performance

To monitor disk performance, use the sar -b and sar -u commands.

sar -b Output Columns

Columns Description
bread/s, bwrit/s Blocks read and blocks written per second (important for file system databases)
pread/s, pwrit/s Partitions read and partitions written per second (important for raw partition database systems)

An important sar -u column for analyzing disk performance is %wio, the percentage of CPU time waiting on blocked I/O.

Not all Linux distributions display the %wio column in the output of the sar -u command.
Key indicators are:

The sum of the bread, bwrit, pread, and pwrit columns indicates the level of activity of the disk I/O subsystem. The higher the sum, the busier the I/O subsystem. The larger the number of physical drives, the higher the sum threshold number can be. A good default value is no more than 40 for two drives and no more than 60 for four to eight drives.

The %rcache column value should be greater than 90 and the %wcache column value should be greater than 60. Otherwise, the system may be disk I/O bound.

If the %wio column value is consistently greater than 20, the system is I/O bound.

Tuning CPU Usage

This section provides information on tuning CPU usage.

Keep All Oracle Users/Processes at the Same Priority

Oracle is designed to operate with all users and background processes operating at the same priority level. Changing priority levels causes unexpected effects on contention and response times.
For example, if the log writer process (LGWR) gets a low priority, it is not executed frequently enough and LGWR becomes a bottleneck. On the other hand, if LGWR has a high priority, user processes may suffer poor response time.

Tuning Oracle Resource Contention and UNIX Kernel Parameters

You can improve performance by keeping the UNIX kernel as small as possible. The UNIX kernel typically pre-allocates physical memory, leaving less memory available for other processes such as the Oracle processes.
Traditionally, kernel parameters such as NBUF, NFILE, and NOFILES were used to adjust kernel size. However, most UNIX implementations dynamically adjust those parameters at run time, even though they are present in the UNIX configuration file.
Look for memory-mapped video drivers, networking drivers, and disk drivers that can be removed, freeing more memory for use by other processes.

Remember to make a backup copy of your UNIX kernel. See your operating system documentation for additional information.

Tuning the Operating System Buffer Cache

To take full advantage of raw devices, adjust the size of the Oracle9i buffer cache and, if memory is limited, the operating system buffer cache.
The operating system buffer cache holds blocks of data in memory while they are being transferred from memory to disk, or from disk to memory.
The Oracle9i buffer cache is the area in memory that stores the Oracle database buffers. Because Oracle9i can use raw devices, it does not need to use the operating system buffer cache.
If you use raw devices, increase the size of the Oracle9i buffer cache. If the amount of memory on the system is limited, make a corresponding decrease in the operating system buffer cache size.
Use the sar command to determine which buffer caches increase or decrease.

Using Raw Devices/Volumes

Guidelines for Using Raw Devices/Volumes

Raw devices/volumes have the following disadvantages when used on UNIX:

Raw devices/volumes may not solve problems with file size writing limits.

To display current file size limits, enter the following command:
$ ulimit -a

Small client systems might not be able to use sufficiently large raw device/volume partitions.

If a particular disk drive has intense I/O activity and performance would benefit from movement of an Oracle datafile to another drive, it is likely that no acceptably sized section exists on a drive with less I/O activity. It might not be possible to move files to other disk drives if you are using raw devices/volumes.

Raw devices/volumes may be more difficult to administer than datafiles stored on a file system.

In addition to the factors described in this section, consider the following issues when deciding whether to use raw devices/volumes:

Oracle9i Real Application Clusters installation

Each instance of Oracle9i Real Application Clusters has it's own log files. Therefore, in addition to the partitions required for the tablespaces and control files, each instance requires a minimum of three partitions for the log files. All the files must be on disks that can be shared by all nodes of a cluster.
Raw disk partition availability

Use raw devices/volumes for Oracle files only if your site has at least as many raw disk partitions as Oracle datafiles. If disk space is a consideration and the raw disk partitions are already formatted, match datafile size to partition size as closely as possible to avoid wasting space.
You must also consider the performance implications of using all of the disk space on a few disks as opposed to using less space on more disks.
Logical volume manager

The logical volume manager manages disk space at a logical level. With logical volumes, you can create logical disks based on raw partition availability. The logical volume manager controls fixed-disk resources by:

Mapping data between logical and physical storage

Allowing data to span multiple disks and to be discontiguous, replicated, and dynamically expanded

Dynamic performance tuning

You can optimize disk performance when the database is online by moving files from disk drives with high activity to disk drives with less activity. Most hardware vendors who provide the logical disk facility also provide a graphical user interface you can use for tuning.
Mirroring and online disk replacement

You can mirror logical volumes to protect against loss of data. If one copy of a mirror fails, dynamic resynchronization is possible. Some vendors also provide the ability to replace drives online in conjunction with the mirroring facility.
For Oracle9i Real Application Clusters, you can use logical volumes for drives associated with a single UNIX system, as well as those that can be shared with more than one computer of a UNIX cluster. Shared drives allow for all files associated with the Oracle9i Real Application Clusters to be placed on these shared logical volumes.

Raw Device Setup

Keep the following items in mind when creating raw devices:

When creating the volumes, ensure that the owner is oracle and the group is oinstall.

The size of an Oracle datafile created in a raw partition must be at least two Oracle block sizes smaller than the size of the raw partition.

Using Trace and Alert Files

This section describes the trace (or dump) and alert files that Oracle9i creates to diagnose and resolve operating problems.

Trace Files

Each server and background process can write to an associated trace file. When a process detects an internal error, it writes information on the error to its trace file. The filename format of a trace file is processname_unixpid_sid.trc, where:

processname Is a three or four-character abbreviated process name identifying the Oracle9i process that generated the file (for example, pmon, dbwr, ora, or reco)
sid Is the instance system identifier
unixpid Is the UNIX process ID number

A sample trace filename is $ORACLE_BASE/admin/TEST/bdump/lgwr_1237_TEST.trc.
All trace files for background processes are written to the destination directory specified by the BACKGROUND_DUMP_ DEST initialization parameter. All trace files for server processes are written to the destination directory specified by the USER_DUMP_DEST initialization parameter.
Set the MAX_DUMP_FILE initialization parameter to at least 5000 to ensure that the trace file is large enough to store error information.

Alert Files

The alert_sid.log file stores significant database events and messages. Anything that affects the database instance or global database is recorded in this file. This file is associated with a database and is located in the directory specified by the BACKGROUND_DUMP_DEST initialization parameter. If you do not set this initialization parameter, the default directory is $ORACLE_HOME/rdbms/log.

Platform	Command
AIX	lsps -a
HP	swapinfo -m
Linux	swapon -s
Solaris	swap -l
Tru64	swapon -s

Tool	Description
lsattr	Displays the attributes of devices
lslv	Displays information about a logical volume or the logical volume allocations of a physical volume
netstat	Displays the contents of network-related data structures
nfsstat	Displays statistics about Network File System (NFS) and Remote Procedure Call (RPC) activity
nice	Changes the initial priority of a process
no	Displays or sets network options
ps	Displays the status of one or more processes
reorgvg	Reorganizes the physical-partition allocation within a volume group
time	Prints the elapsed execution, user CPU processing, and system CPU processing time
trace	Records and reports selected system events

Tool	Description
fdpr	Optimizes an executable program for a particular workload
filemon	Uses the trace facility to monitor and report the activity of the file system
fileplace	Displays the placement of a file's blocks within logical or physical volumes
lockstat	Displays statistics about contention for kernel locks
lvedit	Facilitates interactive placement of logical volumes within a volume group
netpmon	Uses the trace facility to report on network I/O and network-related CPU usage
rmss	Simulates systems with various sizes of memory for performance testing
svmon	Captures and analyzes information about virtual-memory usage
syscalls	Records and counts system calls
tprof	Uses the trace facility to report CPU usage at module and source-code-statement levels
BigFoot	Reports the memory access patterns of processes
stem	Permits subroutine-level entry and exit instrumentation of existing executables

Tools	Description
gprof	Creates an execution profile for programs
monitor	Monitors the program counter and calls to certain functions
netfmt	Monitors the network
netstat	Reports statistics on network performance
nfsstat	Reports statistics for each processor
nettl	Captures network events or packets by logging and tracing
prof	Creates an execution profile of C programs and displays performance statistics for your program, showing where your program is spending most of its execution time
profil	Copies program counter information into a buffer
top	Displays the top processes on the system and periodically updates the information

Platform	Command
AIX	chps or mkps
HP	swapon
Linux	swapon -a
Solaris	swap -a
Tru64	swapon -a

Column	Description
vflt/s	Indicates the number of address translation page faults. Address translation faults occur when a process references a valid page not in memory.
rclm/s	Indicates the number of valid pages that have been reclaimed and added to the free list by page-out activity. This value should be zero.

File System	Platform	Description
S5	AIX, HP, Solaris	UNIX System V file system
UFS	AIX, HP, Solaris, Tru64	Unified file system, derived from BSD UNIX
VXFS	AIX, Solaris	Veritas file system
Raw	AIX, HP, Linux, Solaris, Tru64	Raw devices/volumes (No file system)
EXT2	Linux	Extended file system for Linux
AdvFS	Tru64	Poly advanced file system
CFS	Tru64	Cluster file system
JFS	AIX	Journaled file system

Columns	Description
bread/s, bwrit/s	Blocks read and blocks written per second (important for file system databases)
pread/s, pwrit/s	Partitions read and partitions written per second (important for raw partition database systems)

processname	Is a three or four-character abbreviated process name identifying the Oracle9i process that generated the file (for example, pmon, dbwr, ora, or reco)
sid	Is the instance system identifier
unixpid	Is the UNIX process ID number