Backup Basics and best practice. Developing a backup strategy.
A proper backup strategy should be part of every disaster recovery plan. Every company should have at least a basic DRP, business continuity plan and employ risk management techniques.
Data backup and recovery falls in the prevention against data loss category, along side surge protection, interruptible power supplies (UPS), fire prevention systems, data security software (IDS, Antivirus Software, etc).
A backup is not a simple thing. You can't just throw your files on some random storage array every now and then and expect things to work out. A proper backup requires planning, a backup strategy, risk assessment and team work.
1) What does a backup strategy help us mitigate against?
- Disk failure. - Face it, disks fail. Often.
- Filesystem corruption or disk corruption, or other events that leave our disk in an inconsistent state. - I've seen this one too many times. You run out of inodes on a filesystem and you also need to run fsck on it, but you cannot. And you cannot mount it either. Or your filesystem decides to break, and corrupts your data. Filesystem corruption is more often in the case of power failures and forced shutdowns.
- File deletion and accidents: I've seen things like "# rm -rf $HOME/*" on a UNIX system (where the default root $HOME is /, of course) and various other accidents using pipes or dd. And it's usually easier to restore files from backup than to try to restore them after you've erased them. Also, allowing testing and development on production systems will eventually lead to such accidents taking down your main production database sooner or later...
- Stolen or destroyed disks / machines. Laptops are especially vulnerable to this.
- Tampering with the data: Viruses, exploits, hackers may modify and tamper with your data. You may need to perform a roll-back to a previous state of your data.
You must fully understand why you need a backup strategy, what it is you're protecting (don't just think of data, think of your companies reputation, your job security, loss of revenue and such).
2) Who is responsible for the integrity of the data
The data owner. Which, in most cases, is upper management. It's management's responsibility to do a risk assessment, and deploy the proper business continuity and disaster recovery plans. They often decide to delegate permissions and responsibility to such tasks down the chain of command.
3) What do you make backup copies of?
Just your production database? The whole system? Just the critical data tables?
It's your responsibility to asses what data is critical, how long it would take to restore a system in case of a disaster (hope for the best, but expect the worse. Can you cope with a fire? How about and earthquake? How about a disgruntled employee purposely altering or erasing your data?).
4) How often do you backup?
Do you need continuous data protection (CDP)? Can you accept loosing a day's work? How about losing one day of everyone's work? If you have 1000 users, that may as well be 3 years worth of work right there, in a single day. What will that cost you?
5) How do you backup?
Will you be using tapes? Will you purchase a storage array and use it for backup purposes? Will you be using optical media? Are you going to do this across the network? How will the network cope with the load generated by the backup?
What software will you use? How much does it cost? What about the total cost of ownership? Will future versions be supported and be able to access older backups?
Are you storing all the relevant information to restoring your systems? How about permissions or disk volume / partition configuration?
6) Where will you store your backup?
Doing a backup on the same disk, or on the same disk on the machine you're backing up is usually a very bad idea. Or leaving the tape in the tape drive or on top of your machine. Think about it. That tape probably contains vital and confidential information. You don't want it to go up in flames with your production systems, or get stolen along with the confidential business records it contains. You really need to evaluate all possibilities in terms of on-site storage (fire-proof tape safe, tape robots or even a simple storage cabinet) as well as off-site storage.
Also, consider online backup vs. offline backup. Consider something as simple as a rsync-snapshot/rsnapshot/rsync-with snapshots or as complex and powerful as Sun's StorageTek Availability Suite.
7) Monitoring and testing
Don't just dump the fs content to a tape and leave it there. Test and monitor your backup and restore procedures.
Test your backup software. Some companies even go so far as to test a new backup solution for years in parallel to the old solution before committing themselves to using only the new backup method.
Always keep your backup policy up to date. Make sure your plans and strategies reflect real life situations.
8) Restoring data
Backups in themselves have little importance. It's restoring the data that matters. How will you get data off those tapes / cds? How long will it take? Will you have your system up and running to a satisfactory baseline, fast? How about restoring individual files from a certain point in time? Say a user requires an 3 month old email. What will you do then, restore the whole mail database, from 23 tapes by restoring a full backup, the incremental, the differentials, etc? That wouldn't be too much fun, now would it...
It's usually a good idea to take a layered approach to backup. Like in Windows, you have System Restore to revert to a previous state of the operating system, Shadow Copies to restore older versions of files, you have ntbackup to do a system state, registry and filesystem backup, and Windows Complete PC Backup (or tools like Norton Ghost) for bare metal recovery.
Make sure your mail server stores emails for a certain period of time. Have your log servers store logs and rotate / archive them at a specific interval. Here is where products like QFS / SAMfs really shine.
The idea is simple. It's a lot easier to restore and manipulate files from lower backup levels than it is to manipulate bare metal recovery backups or full filesystem backups.
Data within a company grows at an exponential rate. As data grows, so will your backup needs. You will need to plan ahead, make sure all your backup systems are scalable and can handle growth. If you hit some weird limitation (like with some filesystems for example, that can't grow beyond 1-2 TB and so on), that's pretty much it..
Also, make sure that while your systems are scalable, they don't grow beyond your power to manage them. Keep things simple and easy to understand. Document everything.