Fog Creek Software
Discussion Board




Murphy's Law & Abstractions

Boy, did you ever take away the wrong lesson from Chapter One.

The lesson you should have learned was that the backup software is an abstraction layer, per your earlier article.  Suppose it had run for 6 hours on a 60GB disk and succeeded.  Would that be good?  How much productivity did you lose in those 6 hours?

What you've learned is that disk failure really sucks, but by converting to RAID, all you've done is made is suck less often.  But it' still going to suck when your controller fails.  And if you have an array that depends on the controller for its configuration, it could really, really suck.  Trying to solve the problem by eliminating failure will never work; the real goal is to minimize the costs of failure.

I'm now favoring (and implementing, as budget allows), a two-phased backup strategy.  In the day $1/GB hard disks, daily mirroing to an external disk covers most routine failures, including accidental file deletion.  Tape is stored off-site for the really significant disasters.

I should note that in order to make backup-to-disk viable, you need to identify things that aren't safely backed up that way, such as SQL or Exchange databases, and provide an alternate strategy for them; we still rely on tape backups for those.

By the same token, we have a nice expensive file server with about a 120GB RAID5 array.  We copy the data nightly to an external IDE drive.  It's also backed up to tape, but the disk is our first avenue for restores.  If nothing else, we'll know whether it will succeed or fail within a few minutes, whereas with tape you can spend 15 minutes just loading the proper tape and bringing up the index.

Regardless of what strategy is adopted, the lesson is that backup to tape adds several layers of abstractions, both in hardware and software.  And experience tells me that they're not to be relied upon.  More often than not, you'll be forced learn about the backup software's inner workings while sweating an important restore.

bobk
Tuesday, January 28, 2003

Backing up to disk is okay until you type something that "backs it up" the wrong way and wipe out the days live data. Oops.

been there, done that
Tuesday, January 28, 2003

I'm considering using Dantz Retrospect to backup our servers. I would use their "progressive backup" feature to trickle changes over the Internet to an offsite location where I would store the backups on a large RAID array.

Their server edition claims to have the ability to backup open SQL databases (apparently this uses NT copy-on-write which allows backup programs to get a snapshot of a file instantaneously).

Joel Spolsky
Tuesday, January 28, 2003

Well, Joel, it's probably not any consolation but your disaster has served as a wakeup call for me. (Probably others, as well.)

I've got decent backup strategies in place, but I'll admit I've never tested the recovery process. My client won't shell out the $$$ for an expensive solution, but given the ideas that have been tossed out during this discussion, I'll be able to piece together a more solid backup and recovery plan.

Mark Hoffman
Tuesday, January 28, 2003

Joel,

do not bother with a 3rd party SQL Server backup feature. This is an often discussed topic on the excellent SSWUG mailing list, and the concensus there seems to be that best practice is to use the backup utility that comes with SQL Server to do a backup to disk, and then backup this backup in your global routine. This recommendation is supported by people running major SQL Server sites.

Just me (Sir to you)
Wednesday, January 29, 2003

*  Recent Topics

*  Fog Creek Home