Friday, June 08, 2007

Some notes on Disasters

A few days ago we were hit by a cyclone, and this brought up the subject of disaster recovery and business continuity again. We have spent many hours discussing vital services in the past and how to ensure continuity for them (or how to ensure that we could recover from a disaster) and thanks to the cyclone we could test our assumptions.


On disaster recovery (DR) I noticed that we were lucky. We had an orderly shutdown before the storm hit us and no real damage was done to the data center. So recovery was easy - it was just a matter of getting the computer floor dry, getting power up and running the start-up sequence.
Things would have been worse if the storm had wiped out our facility, since we had only limited backup equipment - and most data was kept in the same site (just 500 m down the road). But arranging a full mirror with sufficient distance is expensive ...


On business continuity (BC) I noticed that really most services are not vital. The main office was completely deserted, but production continued. The next morning when I arrived in the data center I saw nobody around, but all services were humming happily. Is it really that most what we do is not really necessary? It was good to see that most production operations just need basic IT functions and therefore they were not hit. This also gives me some thoughts about our plans for IP telephony (IPT), since this is a more vulnrenable service that the good old fashioned system that we have today, so it is clear that we still need to have it as a back-up.


So what were the things I missed during the storm? I actually missed mostly my access to the Internet! Being cut-off from the rest of the world with just rumours - no real information - can be dangerous. People were making assumptions about the wind, the waves, etc. and through this had the chance of making the wrong decisions. So having a good news service up and running was vital. If the company also has IPT in place, than Internet + IPT become the most essential services to keep up. For the rest it is mainly about keeping power & water up running, so all the basic IT needed for this is vital. The rest can just run on a laptop (e.g. information on emergency and recovery procedures, or even on paper).


Bottom line is that for BC it is only the bare bones of the IT services that are important and this service obviously needs electricity, so any back-up system in place should be able to run for some time in a limited mode on maybe a small generator or batteries, so we keep the basic utilities up and running + the communication to the outside world. Anything else is luxury. And for DR I think it helps to have some conscious decisions on what information (+ apps + hardware) should be available in a recovery site at some distance, because you may need it one day ...

Labels: ,

0 Comments:

Post a Comment

<< Home