1.2.3 Causes of downtime

IBM

1.2.3 Causes of downtime

The causes of downtime can be either planned events or unplanned events. Planned events can account for as much as 30% of downtime. As mentioned before, rolling upgrades and hot replacements can reduce the planned downtime. However, the most important issue is how to minimize the unplanned downtime, because nobody knows when the unplanned downtime occurs and all businesses require the system to be up during business hours.

Studies have shown that software failures and human error are responsible for a very high percentage of unplanned downtime. Software failures include network software failure, server software failure, and client software failure. Human errors could be related to missing skills but also to the fact that system management is not easy-to-use.

Hardware failures and environmental problems also account for unplanned downtime, although by far not as much as the other factors. Using functions such as state-of-the-art LPAR capabilities with self-optimizing resource adjustments, Capacity on Demand (to avoid overloading of systems), and redundant hardware in the systems (to avoid single points of failure), hardware failures can be further reduced. You can find more information about LPAR for the IBM eServer iSeries and pSeries systems in the following resources: - Logical Partitions on the IBM PowerPC: A Guide to Working with LPAR on POWER5 for IBM eServer i5 Servers, SG24-8000

- Advanced POWER Virtualization on IBM eServer p5 Servers: Introduction and Basic Configuration, SG24-7940

You can find information about Capacity on Demand at:

http://www.ibm.com/servers/eServer/about/cod/

Many environmental problems are data center related. Having a locally located standby might not suffice, because the entire site environment might be affected. Geographic clustering and data replication can minimize downtime caused by such environmental problems.

The end-to-end WebSphere high availability system that eliminates a single point of failure for all parts of the system can minimize both planned and unplanned downtime. We describe the implementation of such a WebSphere high availability system throughout this book.

ibm.com/redbooks