Nine type of IT Disaster
There are nine type of IT Disaster that could be happen every day in IT environment, from user error to hardware failure. This article discussed how IT Disaster happened and how we could prepare against this matter. The original list taken from Unix Backup and Recovery by W. Curtis Preston
1. User error
This has been, by far, the biggest percentage of restores in every environment that I have seen. "Hey, I was sklocking my flambality file, and I accidentally pressed the jankle button. Can you restore it, please?" This one is pretty easy, right? What about the common question: "Can you restore it as of about an hour ago?"
2. System-staff error
This is less common than user error (unless your users have root), but when it happens, oh boy, does it happen! What happens when you newfs your Informix raw device or delete a user's home directory? These restores need to go really fast, since they're your fault. As far as protecting yourself from this type of error, the same is true here as for user errors-either typical nightly backups or snapshots can protect you from this.
3. Hardware failure
Most books talk about protecting yourself from hardware failure, but they usually don't mention that hardware failure can come in two forms: disk drive failure and system-wide failure. It is important to mention this, because it takes two entirely different methods to protect yourself from these failures. Many people do not take this into consideration when planning their data protection plan.
4. Disk drive Failure
Protecting your systems from disk drive failure is relatively simple now. Your only decision is how safe you want to be. Mirroring, often referred to as RAID 1, offers the best protection, but it doubles the cost of your initial drive and controller hardware investment. That is why most people choose one of the other levels of RAID (Redundant Arrays of Independent Disks), the most popular being RAID 5. (RAID 5 volumes protect against the loss of a single drive by calculating and storing parity information on each drive.)
5. Software failure
Protecting yourself from software failure can be difficult. Operating system bugs, database bugs, and system management software bugs can all cause data loss. Once again, the degree to which you protect yourself from these types of failures depends on which type of backups you use. Taking frequent snapshots is the only way to truly protect yourself from losing data, possibly a lot of data, from software failure.
6. Electronic break-ins, vandalism, and theft
These three causes of data loss are really beyond the scope of this book, but their impact on your system is not. If you do lose data due to any one of these, it's really no different from any other type of data loss. If you want to really protect yourself from losing data in this manner, I highly recommend reading the book from which I borrowed this list, Practical Unix and Internet Security.
7. Natural disasters
Are you prepared for a hurricane, tornado, earthquake, or flood? If you're not, you're not alone. Imagine that your entire state was wiped out. If you are using off-site storage, is their facility close to you? Is it prepared to handle whatever type of natural disasters occur in your area? For example, if your office is in a flood zone, does your data storage company store your backups on the first floor? If they're in the flood zone as well, then your data can be lost in one good rain. If you really want to ensure yourself against a major natural disaster, then you should explore real-time, off-site storage at a remote location, discussed later in this chapter.
8. Other disasters
I remember when we used to test our disaster recovery plan at one company. We would pretend that some sort of truck blew up on the street that ran by our data center. The plan was to recover to an alternate building. This would mean that we would have to have off-site storage of media and an alternate site that was prepared to accommodate all our systems. A good way to do this is to separate your production and development systems and place them in different buildings. The development systems can then take the production systems' place, if the production systems are damaged or if power is interrupted to the production building.
9. Archival information
It is a terrible thing to realize that a very important but rarely used file is missing. It is even more terrible indeed to find out that it has been gone longer than your retention cycle. For example, you keep your backups for only three months, after which you reuse the oldest volume, overwriting any backups that are on that volume. If that is the case, then any files that have been missing for more than three months are impossible to recover. No matter how insistent the user is about how important the files are, no matter how many calls he makes to your supervisors, you will never be able to restore the files. That is why you should archive your volumes on a regular basis.