Implement a data maintenance strategy
By implementing an effective data maintenance strategy when a site goes live, we can avoid database issues such as performance degradation, exceeding available storage, and increased backup and restore times.
1. Identify the data to maintain
Monitor database table size and growth rate and collect growth statistics from the production database on either a weekly or monthly basis. IBM recommends to collect this information during the weekend to prevent disruptions to work that is being completed during the week. The method used to track database growth over time depends on the database type.
dbgrowth utility
- Set up the dbgrowth database table:
CREATE TABLE dbgrowth ( rundate DATE DEFAULT CURRENT DATE NOT NULL, tabschema VARCHAR(128) NOT NULL, tabname VARCHAR(128) NOT NULL, card BIGINT DEFAULT 0 NOT NULL ) IN USERSPACE1 INDEX IN USERSPACE1; ALTER TABLE dbgrowth ADD PRIMARY KEY (rundate,tabschema,tabname);
- Create a cron job to populate the dbgrowth table every Sunday. Create a script that is called dbgrowth.sql with the following contents:
SELECT 'WARNING: The last statistics for table ' || rtrim(tabschema) || '.' || tabname || ' ' || ' were taken on ' || char(date(stats_time)) || '. The dbgrowth script requires the stats be up to date.' warning
FROM syscat.tables where date(stats_time) < current date;
-- in case it is run twice
DELETE FROM dbgrowth WHERE rundate = CURRENT DATE;
-- insert
INSERT INTO dbgrowth ( tabschema, tabname, card ) SELECT tabschema, tabname, card
FROM syscat.tables WHERE card >= 0;
- Use the following SQL to list the fastest growing tables:
SELECT CAST( RTRIM(a.tabschema) || '.' || a.tabname AS VARCHAR(40)) tabname,
b.card begincard,
a.card endcard,
DAYS(a.rundate)
- DAYS(b.rundate) numdays, a.card - b.card growth, (a.card-b.card)/(DAYS(a.rundate)-DAYS(b.rundate)) dailyavg
FROM dbgrowth a,
dbgrowth b
WHERE a.tabschema = b.tabschema
AND b.tabname = a.tabname
AND a.tabschema IN ( 'DB2INST1' )
AND a.rundate = ( SELECT MAX(rundate) FROM dbgrowth )
AND b.rundate = ( SELECT MAX(rundate) FROM dbgrowth WHERE rundate < (SELECT MAX(rundate) FROM dbgrowth ))
AND a.card - b.card > 1000
ORDER BY a.card - b.card DESC
FETCH FIRST 50 ROWS ONLY WITH UR;
To compare with the previous one use:
AND b.rundate = ( SELECT MAX(rundate) FROM dbgrowth WHERE rundate < (SELECT MAX(rundate) FROM dbgrowth ))
To compare with a specific date use:
AND b.rundate = ( SELECT MAX(rundate) - 7 DAYS FROM dbgrowth )
The dbgrowth utility saves the current row count for each database table that is found on the card column of the syscat.tables table. This number is accurate only if the statistics of this database table are current.
dbgrowth utility
- Set up the dbgrowth database table:
CREATE TABLE dbgrowth ( rundate DATE NOT NULL, tabschema VARCHAR2 (30) NOT NULL, tabname VARCHAR2 (50) NOT NULL, card NUMBER DEFAULT 0 NOT NULL ); ALTER TABLE dbgrowth ADD PRIMARY KEY (rundate,tabschema,tabname);
- Create a cron job to populate the dbgrowth table every Sunday. Create a script that is called dbgrowth.sql with the following contents:
SELECT 'WARNING: The last statistics for schema ' || rtrim(owner) || ' were taken on ' || to_char(LAST_ANALYZED,'dd-mon-yyyy') || '. The dbgrowth script requires the stats be up to date.' warning
FROM dba_tables where round(LAST_ANALYZED) < round(sysdate) and owner = 'WCS' and rownum < 2;
-- in case it is run twice
DELETE FROM dbgrowth WHERE rundate = round(sysdate);
-- insert
INSERT INTO dbgrowth ( rundate, tabschema, tabname, card ) SELECT round(sysdate), owner, table_name, nvl(num_rows,0)
FROM dba_tables WHERE OWNER = 'WCS';where WCS is the schema name for the WebSphere Commerce repository.
- Use the following SQL to list the fastest growing tables:
SELECT RTRIM(a.tabschema) || '.' || a.tabname tabname, b.card begincard, a.card endcard, a.rundate - b.rundate numdays, a.card - b.card growth, (a.card-b.card)/(a.rundate-b.rundate) dailyavg FROM dbgrowth a, dbgrowth b WHERE a.tabschema = b.tabschema AND b.tabname = a.tabname AND a.tabschema IN ( 'WCS' ) AND a.rundate = ( SELECT MAX(rundate) FROM dbgrowth ) AND b.rundate = ( SELECT MAX(rundate) FROM dbgrowth WHERE rundate < (SELECT MAX(rundate) FROM dbgrowth )) AND a.card - b.card > 1000 AND ROWNUM < 50 ORDER BY a.card - b.card DESC; To compare with the previous one use: AND b.rundate = ( SELECT MAX(rundate) FROM dbgrowth WHERE rundate < (SELECT MAX(rundate) FROM dbgrowth )) To compare with a specific date use: AND b.rundate = ( SELECT MAX(rundate) - 7 DAYS FROM dbgrowth )
The dbgrowth utility saves the current row count for each database table that is found on the card column of the dba_tables table. This number is accurate only if the statistics of this database table are current.
Oracle Flashback Technology
Oracle Flashback Technology is a group of Oracle database features that you can use to view the past states of database objects. We can also return database objects to a previous state without using point-in-time media recovery.
Use Oracle Flashback Technology to track historical table data. For more information about using Oracle Flashback Technology, see Oracle Technology Network. Search for Oracle Flashback Technology.
Tips:
- Use the growth statistics to identify data that can be deleted from fast growing tables or that can be moved to a different database. The following database tables are typically the fastest growing tables:
- ORDERS, which includes completed, pending, and discarded orders
- MEMBER, which includes guest and registered users
- ADDRESS, which includes permanent and temporary addresses
- CACHEIVL
- STAGLOG
- If you frequently delete folders in a production environment, run dbclean on the FOLDER table regularly.
- Review the list of default delete scenarios for particular object and type combinations to determine which database tables and which rows to delete for a particular object and object type. For more information, see Database Cleanup utility objects.
2. Define policies
A policy definition describes the length of time to keep information in the database before it is moved to a Decisions Support System (DSS) or cleaned up. Policies that reflect your current business practices help to maintain a consistent database. For example, the following sample policies define how to long to keep order and guest user information:
- The length of time to store completed orders in the database.
- The number of days before a guest user with no completed orders is deleted.
- The length of time to store pending orders for guest and registered users.
3. Enforce policies with scripting
The Database Cleanup utility is a WebSphere Commerce utility that is designed to help you clean up the database. This utility contains a predefined list of delete statements to delete common WebSphere Commerce objects such as users, orders, and catalog data. Tune the default statements that are available for use with the script to correspond to your customized settings. This customized script removes records that are marked for deletion, as well as any information that is defined as outdated in the database maintenance policies. For more information, see Database Cleanup utility.
Testing
Use both functional and nonfunctional testing to verify your data maintenance strategy.Resting options...
Type of testing Description Functional Use functional testing to verify that the deletions performed when dbclean runs are correct:
- After testing is complete, closely verify cascade deletes on foreign key relationships since deleting a row from one database table might result in records in other tables being affected. For example, deleting a row from the OFFER database table might affect related records in the ORDERITEMS database table.
- After running the dbclean utility script, request that business users validate the state of the database to ensure that no data was unexpectedly deleted.
Nonfunctional Use nonfunctional testing to determine the system resources required to support the dbclean utility. To perform this type of testing, you must have a performance environment and the ability to simulate load.Running the dbclean utility might add the following types of stress to the system:
- Higher than normal CPU on the database server.
- A decrease in overall site response time.
- A lock contention, such as a deadlock, timeout, or lock-wait.
Related concepts
Implement a data maintenance strategy