Database (DB2) performance considerations
The database is usually one of the potential areas for bottlenecks that makes WebSphere Commerce unable to scale and perform well. It is therefore crucial that the database be tuned appropriately for the implementation.
Before attempting to tune the database, it is important to realize that a WebSphere Commerce implementation is considered an On-Line Transaction Processing (OLTP) application, and hence the database should be tuned for an OLTP application. Typical characteristics of an OLTP application are:
- It has frequent insert or update activity.
- There are a high number of users accessing the database.
- SQL executing against the database is quick and relatively simple.
WebSphere Commerce ships with default DB2 optimization settings, such as optimization levels and optimization profile registries. It is highly recommended that you thoroughly test any changes that are made to the optimization settings in a production-like environment, before using them in a production system. Users should be aware that changing the optimization settings may affect the overall performance of the application, either immediately or at a later time, such as when the data volume has increased or the data distribution has changed.
This page covers DB2 UDB V8.2 for distributed platforms (UNIX, Windows, Linux);
The following sections cover some of the specifics related to DB2 UDB V8.2 for distributed platforms.
A more detailed document on DB2 for distributed platforms and WebSphere Commerce can be found at WebSphere Commerce Developer Domain site at the following URL: http://www.ibm.com/developerworks/websphere/zones/commerce/
This section identifies and reiterates some of the key considerations.
Physical environment considerations
Considerations for the physical environment are related to how the data is actually spread among the disks and how the memory is managed for the databases.
Layout on disk
Reading from the database and writing back to the database (disk I/O) may become a bottleneck for any application accessing a database. Proper database layout can help reduce the potential for this bottleneck. It is a significant effort to change the physical layout of the database once it has been created, hence proper planning at the initial stages is important.
The first consideration is to ensure that the DB2 transaction logs reside on their own physical disk. Every update issued against the database is written to the logs (in addition to being updated in memory). Hence there will be a lot of disk I/O in the location where the DB2 transaction logs reside. It is a good practice to try to ensure that all read/write activity on the disk is related only to the transaction logs, thus eliminating any I/O contention with other processes that may access the disk.
To set the location for the DB2 transaction logs, issue the following command:
db2 update db cfg for dbalias using NEWLOGPATH path
You will need to deactivate the database (disconnect all connected sessions and/or issuing the db2 deactivate command) before the new location for the logs will be used.
The second consideration in terms of disk layout is to determine how to manage the table spaces efficiently. One performance principle in the management of Relational Database Management Systems (RDBMs) is to separate the database table data and database index data onto different physical disks. This enables better query performance, since index scans can execute in parallel with data fetch operations because they are on different physical disks.
In DB2, two types of table spaces can be defined:
- System Managed Storage (SMS) table spaces, which is the default
- Database Managed Storage (DMS) table spaces
The separation of table and index data can only be specified if database managed storage (DMS) table spaces are defined. So should you always use DMS table spaces? Not necessarily. In many installations, you may not have the luxury of many physical disks and hence will not be able to separate data and index. Additionally, SMS tablespaces are easier to administer as compared to DMS tablespaces, so you may wish to trade off some performance for ease of use. A lot of up-front planning is required for DMS table spaces, since the space is pre-allocated at creation time. SMS table spaces allocate space as it is required by the database.
If you choose to stick with the default and use SMS table spaces, there is a utility that you can run to improve write performance. Issue the following command:
db2empfa <dbalias> <dbalias>
After this command has been issued, as new space is required by the database, it will be allocated one extent at a time, as opposed to one page at a time. If you choose to go with DMS table spaces, in addition to re-defining the three user table spaces (USERPACE1, TAB8K, TAB16K), we will also need to define an additional 4K table space for index data. You then have to modify the table definition in the schema creation file to point to the new tablespace. For example, if you chose to name the additional table space IND4K, then you would need to access the file:
<wcs install dir>/schema/db2/wcs.schema.sql
You would replace the INDEX IN clause for every table definition to use IND4K. For example, consider the following coding:
CREATE TABLE acacgpdesc ( acactgrp_id INTEGER NOT NULL, displayname VARCHAR(254) NOT NULL, description VARCHAR(254), language_id INTEGER NOT NULL ) IN USERSPACE1 INDEX IN USERSPACE1;
It changes...
CREATE TABLE acacgpdesc ( acactgrp_id INTEGER NOT NULL, displayname VARCHAR(254) NOT NULL, description VARCHAR(254), language_id INTEGER NOT NULL ) IN USERSPACE1 INDEX IN IND4K
This will need to be done prior to the WebSphere Commerce instance creation.
Note: Although you have different page sizes for the table spaces for table data, you only need one size for the table space for index data, because all indexes created are less than 4K in length.
On the same subject of achieving better disk I/O, read performance can be improved by spreading the table spaces across many physical disks. When the database manager has to read from disk, if the data is stored across multiple disks, it can be read in parallel, yielding better read performance. This can be done in one of two ways:
- Specify multiple containers in the table space definition with each container specifying a file system that resides on different physical disks.
- Specify a single container, but the container is a file system that spans several physical disks. You normally would rely on the operating system or a volume manager to help define the file system.
A utility such as iostat on UNIX platforms can be used to identify disk I/O bottlenecks.
Memory
DB2 associates memory for the database through the use of bufferpool objects. A bufferpool has a page size associated with it and is linked to one or more table spaces. Thus, if table spaces of different page sizes are created, then bufferpools corresponding to the different page sizes are required.
While you can create multiple bufferpools having the same page size, it is recommended that only one bufferpool per page size be created, for the most efficient usage of memory on the Database server.
The question is always, how much memory to assign to the bufferpools. For DB2 32-bit implementations, there is a limit, based on the operating system, that can be available for bufferpools. This ranges from a maximum of 1.5GB on AIX platforms to 3.3 GB on Solaris.
Assuming a dedicated Database server, a general rule of thumb is to allocate a large proportion of memory available on the server, about 75% to 80%, but not exceeding the platform limits.
Note that for 64-bit implementations of DB2, the limits are significantly increased. In this case, the bufferpool hit ratio would need to be monitored to determine the optimal setting for the bufferpools. You can also monitor the hit ratio for 32-bit implementation using database snapshots using the following command:
db2 get snapshot for database on <dbalias>
The output generated will contain some statistics on bufferpool logical and physical reads:
Buffer pool data logical reads = DLR
Buffer pool data physical reads = DPR ... Buffer pool index logical reads = ILR Buffer pool index physical reads = IPR
In this output, DLR, DPR, ILR and IPR will have actual values. The hit ratio can be computed using the following formula:
(1 - (( DPR + IPR) / (DLR + ILR))) * 100%
The size of the bufferpool can be changed using the ALTER BUFFERPOOL command, or the BUFFPAGE parameter if the size of the bufferpool is set to -1.
Additional tuning and configuration database parameters
There are many parameters to consider for performance. This section describes a subset of these that are considered important for WebSphere Commerce implementations.
To set the values for the parameters, the following command can be used:
db2 update db cfg for <dbalias> using <paramname> <paramvalue>
Parameters related to memory
The database heap (DBHEAP) contains control block information for database objects (tables, indexes and bufferpools), as well as the pool of memory from which the log buffer size (LOGBUFSZ) and catalog cache size (CATALOGCACHE_SZ) are allocated. Its setting is dependent on the number of objects in the database and the size of the two parameters mentioned.
In general, the following formula can be used to estimate the size of the database heap:
DBHEAP=LOGBUFSZ + CATALOGCACHE_SZ + (SUM(# PAGES in each bufferpool) * 3%)
The log buffer is allocated from the database heap, and is used to buffer writes to the transaction logs for more efficient I/O. The default size of this setting 128 4K pages. A recommended starting point for the log buffer size (LOGBUFSZ) in WebSphere Commerce implementations is 256.
Parameters related to transaction logs
When considering values for the transaction log file size (LOGFILSIZ) and the number of primary (LOGPRIMARY) and secondary (LOGSECOND) logs, some generalizations for OLTP applications can be applied. A high number of short transactions are typical in OLTP systems, hence the size of the log file should be relatively large, otherwise more processing time will be spent managing log files, rather than writing to the transaction logs. A good starting point for the size of the log file in WebSphere Commerce implementations is to set the value to 10000.
Primary log files are allocated when the database is activated, or on the first connect. If a long running transaction fills up all the primary logs, then secondary logs will be allocated as needed until the LOGSECOND limit is reached. The allocation of a secondary log file is a significant performance hit, and should be minimized if it cannot be avoided.
To determine the right settings for these parameters, monitor the database and see if secondary log files are being allocated. If they are, then we will need to increase the number of primary log files. You can monitor by taking a database snapshot and look for the following two lines:
Maximum secondary log space used (Bytes) = 0 Secondary logs allocated currently = 0
A good starting point for the number of primary log files (LOGPRIMARY) is anywhere from 6 to 10.
Parameters related to disk I/O
In addition to physical disk layout, several tuning parameters can be manipulated to affect disk I/O. Two key parameters are NUM_IOSERVERS and NUM_IOCLEANERS.
NUM_IOSERVERS specifies the number of processes that are launched to prefetch data from disk to the bufferpool pages.
To maximize read parallelism, this parameter should be set to the number of physical disks that are being used by the database, to enable reading from each disk in parallel.
NUM_IOCLEANERS specifies the number of processes that are launched to flush dirty bufferpool pages to disk.
To maximize usage of system resources, this parameter should be set to the number of CPUs on the system.
The frequency of how often dirty bufferpool pages are flushed to disk can be influenced by the CHNGPGS_THRESH parameter. Its value represents the limit, in the form of a percentage, that a bufferpool page can be dirty before a flush to disk is forced. For OLTP applications, a lower value is recommended. For WebSphere Commerce implementations, the value should be set to 40.
One final parameter to consider in this section is MAXFILOP. It represents the maximum number of files DB2 can have opened at any given time. If this value is set too low, valuable processor resources will be taken up to open and close files. This parameter needs to be monitored to be set to the correct value, but a good starting point is to set this value to 128. You can monitor by taking a database snapshot and looking at the following line:
Database files closed = 0
If the value monitored is greater than zero, then the value for this parameter should be increased.
Parameters related to locking
Reducing locking contention is key to performance. Several parameters exist to influence locking behavior. The total amount of memory available to the database for locks is defined by the LOCKLIST parameter. The MAXLOCKS parameter defines the maximum amount of memory available for each connection to the database. It is represented as a percentage of the LOCKLIST.
Both of these parameters need to be sized appropriately, in order to avoid lock escalations. A lock escalation occurs when all of the memory available to a connection is used, and multiple row locks on a table are exchanged for a single table lock. The amount of memory used for the first lock on an object is 72 bytes, and each additional lock on the same object is 36 bytes.
A good starting value for LOCKLIST can be approximated by assuming that a connection requires about 512 locks at any given time. The following formula can be used:
LOCKLIST = (512 locks/conn * 72 bytes/lock * # of database connections) / 4096 bytes/page
MAXLOCKS can be set to between 10 and 20 to start. Further monitoring will be necessary to adjust both of these values. In the database snapshot output, look for the following lines:
Lock list memory in use (Bytes) = 432 Lock escalations = 0 Exclusive lock escalations = 0
If lock escalations occur (value higher than 0), increase the locklist to minimize the escalations and/or increase the MAXLOCKS value to increase the limit of how much of the LOCKLIST a connection can use.
Best practices
Here are some of the most common best practices for any IBM DB2 UDB implementation.
Reorganizing data in table spaces
When a high number of inserts, updates, or deletes have been issued against a table in the database, the physical placement of the rows and related indexes may not be optimal. DB2 provides a utility to reorganize data for a table:
db2 REORG TABLE <tabschema>.<tabname>;
DB2 also provides a utility to check if a table or index data needs to be organized. While connected to a database, the following command can be issued:
db2 REORGCHK
This command will check all tables in the database and produce a listing, first by table and second by index. In the listing, an asterisk('*') in any of the last three columns implies that the table or index requires a REORG.
Collecting statistics
Each SQL statement submitted to the database is parsed, optimized, and a statement access plan is created for execution.
To create this access plan, the optimizer relies on table and index statistics. In order for the optimizer to generate the best access plan, up-to-date statistics are required. Collecting statistics frequently (or at least when a significant amount of data changes) is a good practice.
To collect statistics for a table, the following command can be issued: db2 RUNSTATS ON table <tabschema>.<tabname> WITH DISTRIBUTION AND DETAILED INDEXES ALL;
Statistics on the catalog tables should also be collected.
Related tasks
Update the DB2 database configuration
Related reference
Troubleshoot: Errors when loading or deleting MEMBER data on staging server