Implement a data maintenance strategy

By implementing an effective data maintenance strategy when a site goes live, we can avoid database issues such as performance degradation, exceeding available storage, and increased backup and restore times.


1. Identify the data to maintain

Monitor database table size and growth rate and collect growth statistics from the production database on either a weekly or monthly basis. IBM recommends to collect this information during the weekend to prevent disruptions to work that is being completed during the week. The method used to track database growth over time depends on the database type.


DB2dbgrowth utility

  1. Set up the dbgrowth database table:

      CREATE TABLE dbgrowth (
            rundate        DATE DEFAULT CURRENT DATE NOT NULL,
            tabschema      VARCHAR(128) NOT NULL,
            tabname        VARCHAR(128) NOT NULL,
            card           BIGINT DEFAULT 0 NOT NULL
      ) IN USERSPACE1 INDEX IN USERSPACE1;
      
      ALTER TABLE dbgrowth ADD PRIMARY KEY (rundate,tabschema,tabname);

  2. Create a cron job to populate the dbgrowth table every Sunday. Create a script that is called dbgrowth.sql with the following contents:

      SELECT 'WARNING: The last statistics for table ' || rtrim(tabschema) || '.' || tabname || ' ' || ' were taken on ' || char(date(stats_time)) || '. The dbgrowth script requires the stats be up to date.' warning
      FROM syscat.tables where date(stats_time) < current date;

      -- in case it is run twice
      DELETE FROM dbgrowth WHERE rundate = CURRENT DATE;

      -- insert
      INSERT INTO dbgrowth ( tabschema, tabname, card ) SELECT tabschema, tabname, card
      FROM syscat.tables WHERE card >= 0;

  3. Use the following SQL to list the fastest growing tables:

      SELECT CAST( RTRIM(a.tabschema) || '.' || a.tabname AS VARCHAR(40)) tabname,
      b.card begincard,
      a.card endcard,
      DAYS(a.rundate)
      - DAYS(b.rundate) numdays, a.card - b.card growth, (a.card-b.card)/(DAYS(a.rundate)-DAYS(b.rundate)) dailyavg
      FROM dbgrowth a,
      dbgrowth b
      WHERE a.tabschema = b.tabschema
      AND b.tabname = a.tabname
      AND a.tabschema IN ( 'DB2INST1' )
      AND a.rundate = ( SELECT MAX(rundate) FROM dbgrowth )
      AND b.rundate = ( SELECT MAX(rundate) FROM dbgrowth WHERE rundate < (SELECT MAX(rundate) FROM dbgrowth ))
      AND a.card - b.card > 1000
      ORDER BY a.card - b.card DESC
      FETCH FIRST 50 ROWS ONLY WITH UR;

      To compare with the previous one use:
      AND b.rundate = ( SELECT MAX(rundate) FROM dbgrowth WHERE rundate < (SELECT MAX(rundate) FROM dbgrowth ))

      To compare with a specific date use:
      AND b.rundate = ( SELECT MAX(rundate) - 7 DAYS FROM dbgrowth )

The dbgrowth utility saves the current row count for each database table that is found on the card column of the syscat.tables table. This number is accurate only if the statistics of this database table are current.


Oracledbgrowth utility

  1. Set up the dbgrowth database table:

      CREATE TABLE dbgrowth (
                rundate        DATE NOT NULL,
                tabschema      VARCHAR2 (30) NOT NULL,
                tabname        VARCHAR2 (50) NOT NULL,
                card           NUMBER DEFAULT 0 NOT NULL
          );
      
          ALTER TABLE dbgrowth ADD PRIMARY KEY (rundate,tabschema,tabname);

  2. Create a cron job to populate the dbgrowth table every Sunday. Create a script that is called dbgrowth.sql with the following contents:

      SELECT 'WARNING: The last statistics for schema ' || rtrim(owner) || ' were taken on ' || to_char(LAST_ANALYZED,'dd-mon-yyyy') || '. The dbgrowth script requires the stats be up to date.' warning
      FROM dba_tables where round(LAST_ANALYZED) < round(sysdate) and owner = 'WCS' and rownum < 2;

      -- in case it is run twice
      DELETE FROM dbgrowth WHERE rundate = round(sysdate);

      -- insert
      INSERT INTO dbgrowth ( rundate, tabschema, tabname, card ) SELECT round(sysdate), owner, table_name, nvl(num_rows,0)
      FROM dba_tables WHERE OWNER = 'WCS';

    where WCS is the schema name for the WebSphere Commerce repository.

  3. Use the following SQL to list the fastest growing tables:

      SELECT RTRIM(a.tabschema) || '.' || a.tabname tabname, b.card begincard, a.card endcard, a.rundate 
      - b.rundate numdays, a.card - b.card growth, (a.card-b.card)/(a.rundate-b.rundate) dailyavg
            FROM dbgrowth a,
                 dbgrowth b
           WHERE a.tabschema  = b.tabschema
             AND b.tabname    = a.tabname
             AND a.tabschema IN ( 'WCS' )
             AND a.rundate    = ( SELECT MAX(rundate) FROM dbgrowth )
             AND b.rundate    = ( SELECT MAX(rundate) FROM dbgrowth WHERE rundate < (SELECT MAX(rundate) FROM dbgrowth ))
             AND a.card - b.card > 1000
             AND ROWNUM < 50
           ORDER BY a.card - b.card DESC;
      
      To compare with the previous one use:
              AND b.rundate    = ( SELECT MAX(rundate) FROM dbgrowth WHERE rundate < (SELECT MAX(rundate) FROM dbgrowth ))
      
      To compare with a specific date use:
              AND b.rundate    = ( SELECT MAX(rundate) - 7 DAYS FROM dbgrowth )

The dbgrowth utility saves the current row count for each database table that is found on the card column of the dba_tables table. This number is accurate only if the statistics of this database table are current.


OracleOracle Flashback Technology

Oracle Flashback Technology is a group of Oracle database features that you can use to view the past states of database objects. We can also return database objects to a previous state without using point-in-time media recovery.

Use Oracle Flashback Technology to track historical table data. For more information about using Oracle Flashback Technology, see Oracle Technology Network. Search for Oracle Flashback Technology.

Tips:


2. Define policies

A policy definition describes the length of time to keep information in the database before it is moved to a Decisions Support System (DSS) or cleaned up. Policies that reflect your current business practices help to maintain a consistent database. For example, the following sample policies define how to long to keep order and guest user information:


3. Enforce policies with scripting

The Database Cleanup utility is a WebSphere Commerce utility that is designed to help you clean up the database. This utility contains a predefined list of delete statements to delete common WebSphere Commerce objects such as users, orders, and catalog data. Tune the default statements that are available for use with the script to correspond to your customized settings. This customized script removes records that are marked for deletion, as well as any information that is defined as outdated in the database maintenance policies. For more information, see Database Cleanup utility.


Testing

Use both functional and nonfunctional testing to verify your data maintenance strategy.

Resting options...

Type of testing Description
Functional Use functional testing to verify that the deletions performed when dbclean runs are correct:

  1. After testing is complete, closely verify cascade deletes on foreign key relationships since deleting a row from one database table might result in records in other tables being affected. For example, deleting a row from the OFFER database table might affect related records in the ORDERITEMS database table.

  2. After running the dbclean utility script, request that business users validate the state of the database to ensure that no data was unexpectedly deleted.

Nonfunctional Use nonfunctional testing to determine the system resources required to support the dbclean utility. To perform this type of testing, you must have a performance environment and the ability to simulate load.Running the dbclean utility might add the following types of stress to the system:

  • Higher than normal CPU on the database server.

  • A decrease in overall site response time.

  • A lock contention, such as a deadlock, timeout, or lock-wait.


Related concepts
Implement a data maintenance strategy