Disaster recovery plan

 

This topic contains a template to use when you create a disaster recovery plan.

 

Section 1. Major goals of this plan

The following list contains the major goals of this plan:

 

Section 2. Personnel

Table 1. Personnel
Data processing personnel
Name Position Address Telephone
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       

Attach a copy of your organization chart to this section of the plan.

Section 3. Application profile

Use the Display Software Resources (DSPSFWRSC) command to complete this table.

Table 2. Application profile
Application profile
Application name Critical? Yes/No Fixed asset? Yes/No Manufacturer Comments
         
         
         
         
         
         
         
Comment legend:

  1. Runs daily ____________.

  2. Runs weekly on ________.

  3. Runs monthly on ________.

 

Section 4. Inventory profile

Use the Work with Hardware Products (WRKHDWPRD) command to complete this table. This list should include the following items:

  • Processing units

  • Disk units

  • Models

  • Workstation controllers

  • Personal computers

  • Spare workstations

  • Telephones

  • Air conditioner or heater

  • System printer

  • Tape and diskette units

  • Controllers

  • I/O processors

  • General data communication

  • Spare displays

  • Racks

  • Humidifier or dehumidifier

Table 3. Inventory profile
Inventory profile
Manufacturer Description Model Serial number Own or leased Cost
           
           
           
           
           
           
           

This list should be audited every ________ months.

Table 4. Miscellaneous inventory
Miscellaneous inventory
Description Quantity Comments
     
     
     
     
     

This list includes the following items:

  • Tapes

  • PC software

  • File cabinet contents or documentation

  • Tape vault contents

  • Diskettes

  • Emulation packages

  • Programming language software

  • Printer supplies (such as paper and forms)

 

Section 5. Information services backup procedures

  • i5/OS® operating system

    • Daily: Journal receivers are changed at ________ and at ________.

    • Daily: Changed objects in the following libraries and directories are saved at ______:

      • __________

      • __________

      • __________

      • __________

      • __________

      • __________

      • __________

      • __________

      This procedure also saves the journals and journal receivers.

    • On ________ (day) at ________ (time) a complete save operation of the system is done.

    • All save media is stored off site in a vault at ________ (location).

  • Personal computer

    • It is suggested that all personal computers be backed up. The copies of the personal computer files should be uploaded to the system on ________ (date) at ________ (time), just before a complete save operation of the system is done. It is saved with the normal system save procedure. This provides for a more secure backup of personal computer-related systems where a local area disaster can wipe out important personal computer systems.

 

Section 6. Disaster recovery procedures

For any disaster recovery plans, the following three elements should be addressed:

Emergency response procedures

To document the appropriate emergency response to a fire, natural disaster, or any other activities in order to protect lives and limit damages.

Backup operations procedures

To ensure that essential data processing operational tasks can be conducted after the disruption.

Recovery actions procedures

To facilitate the rapid restoration of a data processing system following a disaster.

Disaster action checklist:

  1. Plan initiation

    1. Notify the senior management.

    2. Contact and set up a disaster recovery team.

    3. Determine the degree of a disaster.

    4. Implement an appropriate application recovery plan dependent on the extent of the disaster (see Section 7. Recovery plan–mobile site).

    5. Monitor the progress.

    6. Contact the backup sites and establish the schedules.

    7. Contact all other necessary personnel, both user and data processing.

    8. Contact vendors, both hardware and software.

    9. Notify users of the disruption of service.

  2. Follow-up checklist:

    1. List teams and tasks of each.

    2. Obtain emergency cash and set up transportation to and from the backup site.

    3. Set up the living quarters.

    4. Set up the eating establishments.

    5. List all personnel and their telephone numbers.

    6. Establish the user participation plans.

    7. Set up the delivery and the receipt of mail.

    8. Establish the emergency office supplies.

    9. Rent or purchase the equipment, as needed.

    10. Determine the applications to be run and in what sequence.

    11. Identify the number of workstations that are needed.

    12. Check out any offline equipment needed for each application.

    13. Check on the forms needed for each application.

    14. Check all data being taken to the backup site before leaving, and leave the inventory profile at the home location.

    15. Set up the primary vendors for assistance with problems incurred during emergency.

    16. Plan for transportation of any additional items needed at the backup site.

    17. Take the directions (maps) to backup site.

    18. Check for the additional magnetic tapes.

    19. Take copies of the system and operational documentation and procedural manuals.

    20. Ensure that all personnel involved know their tasks.

    21. Notify the insurance companies.

Recovery start-up procedures for use after a disaster:

  1. Notify _________ Disaster Recovery Services of the need to utilize service and of recovery plan selection.

    Guaranteed delivery time countdown begins at the time _________ is notified of recovery plan selection.

    1. Disaster notification numbers

      ________ or ________

    These telephone numbers are in service from ________ a.m. until ________ p.m. Monday through Friday.

  2. Disaster notification number: ________

    This telephone number is in service for disaster notification after business hours, on weekends, and during holidays. Please use this number only for the notification of the actual disaster.

  3. Provide _________ with an equipment delivery site address (when applicable), a contact, and an alternate contact for coordinating service and telephone numbers at which contacts can be reached 24 hours a day.

  4. Contact power and telephone service suppliers and schedule any necessary service connections.

  5. Notify _________ immediately if any related plans need to be changed.

 

Section 7. Recovery plan–mobile site

  1. Notify _________ of the nature of the disaster and the need to select the mobile site plan.

  2. Confirm in writing the substance of the telephone notification to _________ within 48 hours of the telephone notification.

  3. Confirm all needed backup media are available to load the backup machine.

  4. Prepare a purchase order to cover the use of the backup equipment.

  5. Notify _________ of plans for a trailer and its placement (on ________ side of ________). (See the Mobile site setup plan in this section.)

  6. Depending on communication needs, notify telephone company (________) of possible emergency line changes.

  7. Begin setting up power and communications at _________:

    1. Power and communications are prearranged to hook into when the trailer arrives.

    2. At the point where telephone lines come into the building (_________), break the current linkage to the administration controllers (_________). These lines are rerouted to the lines going to the mobile site. They are linked to the modems at the mobile site.

      The lines currently going from _________ to _________ is linked to the mobile unit via modems.

    3. This might conceivably require _________ to redirect lines at _________ complex to a more secure area in case of disasters.

  8. When the trailer arrives, plug into power and do necessary checks.

  9. Plug into the communications lines and do necessary checks.

  10. Begin loading system from backups (see Section 9. Restoring the entire system).

  11. Begin normal operations as soon as possible:

    1. Daily jobs

    2. Daily saves

    3. Weekly saves

  12. Plan a schedule to back up the system in order to restore on a home-based computer when a site is available. (Use regular system backup procedures).

  13. Secure the mobile site and distribute the keys as required.

  14. Keep a maintenance log on mobile equipment.

Mobile site setup plan:

Attach the mobile site setup plan here.

Communication disaster plan:

Attach the communication disaster plan, including the wiring diagrams.

Electrical service:

Attach the electrical service diagram here.

 

Section 8. Recovery plan–hot site

The disaster recovery service provides an alternate hot site. The site has a backup system for temporary use while the home site is being reestablished.

  1. Notify _________ of the nature of the disaster and of its desire for a hot site.

  2. Request air shipment of modems to _________ for communications. (See _________ for communications for the hot site.)

  3. Confirm in writing the telephone notification to _________ within 48 hours of the telephone notification.

  4. Begin making necessary travel arrangements to the site for the operations team.

  5. Confirm that all needed tapes are available and packed for shipment to restore on the backup system.

  6. Prepare a purchase order to cover the use of the backup system.

  7. Review the checklist for all necessary materials before departing to the hot site.

  8. Make sure that the disaster recovery team at the disaster site has the necessary information to begin restoring the site. (See Section 12. Disaster site rebuilding).

  9. Provide for travel expenses (cash advance).

  10. After arriving at the hot site, contact home base to establish communications procedures.

  11. Review materials brought to the hot site for completeness.

  12. Begin loading the system from the save tapes.

  13. Begin normal operations as soon as possible:

    1. Daily jobs

    2. Daily saves

    3. Weekly saves

  14. Plan the schedule to back up the hot-site system in order to restore on the home-based computer.

Hot-site system configuration:

Attach the hot-site system configuration here.

 

Section 9. Restoring the entire system

To get your system back to the way it was before the disaster, use the procedures on recovering after a complete system loss in Systems management: Backup and recovery.

Before you begin, find the following tapes, equipment, and information from the on-site tape vault or the off-site storage location:

  • If you install from the alternate installation device, you need both your tape media and the CD-ROM media containing the Licensed Internal Code.

  • All tapes from the most recent complete save operation.

  • The most recent tapes from saving security data (SAVSECDTA or SAVSYS).

  • The most recent tapes from saving your configuration.

  • All tapes containing journals and journal receivers saved since the most recent daily save operation.

  • All tapes from the most recent daily save operation.

  • Program temporary fix (PTF) list (stored with the most recent complete save tapes, weekly save tapes, or both).

  • Tape list from the most recent complete save operation.

  • Tape list from the most recent weekly save operation.

  • Tape list from daily saves.

  • History log from the most recent complete save operation.

  • History log from the most recent weekly save operation.

  • History log from the daily save operations.

  • The Installing, upgrading, or deleting i5/OS and related software topic collection.

  • The Systems management: Backup and recovery topic collection.

  • Telephone directory.

  • Modem manual.

  • Tool kit.

 

Section 10. Rebuilding process

The management team must assess the damage and begin the reconstruction of a new data center.

If the original site must be restored or replaced, you need to consider the following factors:

  • What is the projected availability of all needed computer equipment?

  • Is it be more effective and efficient to upgrade the computer systems with newer equipment?

  • What is the estimated time needed for repairs or construction of the data site?

  • Is there an alternative site that more readily can be upgraded for computer purposes?

After the decision to rebuild the data center is made, go to Section 12. Disaster site rebuilding.

 

Section 11. Testing the disaster recovery plan

In successful contingency planning, it is important to test and evaluate the plan regularly. Data processing operations are volatile in nature, resulting in frequent changes to equipment, programs, and documentation. These actions make it critical to consider the plan as a changing document. Use these checklists as your conduct, your test, and decide what areas should be tested.

Table 5. Conducting a recovery test
Item Yes No Applicable Not applicable Comments
Select the purpose of the test. What aspects of the plan are being evaluated?
Describe the objectives of the test. How do you measure successful achievement of the objectives?
Meet with management and explain the test and objectives. Gain their agreement and support.
Have management announce the test and the expected completion time.
Collect the test results at the end of the test period.
Evaluate the results. Is recovery successful? Why or why not?
Determine the implications of the test results. Does the successful recovery in a simple case imply the successful recovery for all critical jobs in the tolerable outage period?
Make recommendations for changes. Call for responses by a given date.
Notify other areas of results. Include users and auditors.
Change the disaster recovery plan manual as necessary.

Table 6. Areas to be tested
Item Yes No Applicable Not applicable Comments
Recovery of individual application systems by using files and documentation stored off site.
Reloading of system tapes and performing an initial program load (IPL) by using files and documentation stored off site.
Ability to process on a different computer.
Ability of management to determine priority of systems with limited processing.
Ability to recover and process successfully without key people.
Ability of the plan to clarify areas of responsibility and the chain of command.
Effectiveness of security measures and security bypass procedures during the recovery period.
Ability to accomplish emergency evacuation and basic first-aid responses.
Ability of users of real-time systems to cope with a temporary loss of online information.
Ability of users to continue day-to-day operations without applications or jobs that are considered noncritical.
Ability to contact the key people or their designated alternates quickly.
Ability of data entry personnel to provide the input to critical systems by using alternate sites and different input media.
Availability of peripheral equipment and processing, such as printers and scanners.
Availability of support equipment, such as air conditioners and dehumidifiers.
Availability of supports: supplies, transportation, and communication.
Distribution of output produced at the recovery site.
Availability of important forms and paper stock.
Ability to adapt plan to lesser disasters.

 

Section 12. Disaster site rebuilding

  • Floor plan of the data center.

  • Determine the current hardware needs and possible alternatives. (See Section 4. Inventory profile.)

  • Data center square footage, power requirements, and security requirements.

    • Square footage ________.

    • Power requirements ________.

    • Security requirements: locked area, preferably with combination lock on one door.

    • Floor-to-ceiling studding.

    • Detectors for high temperature, water, smoke, fire, and motion

    • Raised floor

Vendors:

Floor plan:

Include a copy of the proposed floor plan here.

 

Section 13. Record of plan changes

Keep your current plan. Keep records of changes to your configuration, your applications, and your backup schedules and procedures. For example, you can print a list of your current local hardware by typing:

DSPHDWRSC OUTPUT(*PRINT)

 

Parent topic:

Planning disaster recovery