Fog Creek Software
Discussion Board

Disaster Recovery

Doesn't anyone else have disaster recovery plans here?  The more explanations I read from Joel about his recent problems, the more I wonder whether Fog Creek even has one.

We're a smaller company than Fog Creek - but if every single one of our machines (+/-10) were to get toasted simultaneously today, the basic testing we've done indicates we could be back up and running faster than Joel managed the one hard drive failure.

We have redundant Internet connections (with auto-switchover) from two different providers (and no, they don't resell the same service), a gas power generator, UPS for every machine capable of running long enough to get the generator going (and get to the office first!), as well as a restoration plan in case of various machine melt-downs.

I know that we are somewhat unusual in the thoroughness of our plan (and it's not even close to where we want to get it) - but are we unusual in even having a plan?

And if you do have a plan, how thoroughly/often have you tested it (aside from in the case of disaster)?

Monday, January 27, 2003

We have a huge UPS for the AS/400 and we have several recovery sites (one is in New York City). Our 400 is mirrored to an offsite out of state remote location. We can be up and running in a day even if our office was totally destroyed.

Of course our company is a *little* bigger than fog creek (+/- 3000 employees).

Monday, January 27, 2003

Where is the cloudy, stinky, exhaust and loud noise of the generator supposed to go?  Must be part of a switchover to a remote site.  You wouldn't want that in a building. 

Brian R.
Monday, January 27, 2003

Ours is on the roof :)

Thinking about it, I suspect not all landlords would allow drilling the hole in the roof and running the cable.  I'm not sure what the other options (affordable to the small company, anyway) would be for power backup.

Monday, January 27, 2003

Reading this story it seemed to me that perhaps part of what's going on is that Microsoft Windows, in my admittedly limited experience (mostly Win NT 4.0 Workstation) is remarkably hostile to backups.

I was the Sun sysadmin for our lab group in grad school, and we had a flaky system corruption problem. (OS bug: writing to /dev/null from Fortran programs silently killed something -- maybe the boot sector -- so the system could not be rebooted next time.) I'm sure I restored the system at least three times, and my best guess is 6. It was a royal chore, but it was straightforward. Even if your only tool is tar, pulling out all the files you need to back up just isn't that hard. (And much more sophisticated backup tools than tar are of course available.)

I was my own de facto sysadmin when I was working on Windows NT. I bought a backup utility -- since one advantage of Windows is that software is easy to find and cheap to buy -- did my best to RTFM, and backed everything up. Then I got hit by a flaky system corruption problem. (Putting a space in a filename in a particular entry box in Visual Studio trashed Visual Studio beyond repair, in a way which resisted either being uninstalled or overwritten by a new copy, and since Visual Studio was the reason I had to use Windows, it was reinstall time.) This time, not only was the original backup not right -- key stuff not being backed up, with details I've mercifully forgotten, but it was at least analogous to key Registry entries, or all Registry entries getting lost -- but when I realized the problem and set out to find a way of backing up which wouldn't have this problem, it was *really* *hard* to do so, to the point where I spent more than a working day over a calendar week, and just gave up. (I found a way of backing up the source of the program I was writing, and decided not to keep any other state on that machine, so in case of a future failure, I'd just reinstall Windows and VC++ and be good to go.)

My informal impression from more knowledgeable Windows people is that Microsoft intentionally makes it difficult to make complete backups because they're so hostile to people copying and mirroring their software installs. After all, a truly complete backup would be something that you could use to make another system. I don't know how true this was, or how true it is today, but it seemed like the best explanation of my experience, and if it is still true, it would explain why there's so much "I backed it up and it didn't help" in this story.

It might be hard for me to believe this could remain true for years without enterprise customers looking really hard for ways to leave Microsoft. However, I know that remote sysadmin (at least in those few bits of the Microsoft-using world where I have current personal anecdotal access) is still screwed up. Some combination of admins hiking around and sending out emails with checklists ("open this, click this box...") for every single workstation user to execute. I hear from the 'net that MS software itself has improved in this regard, so that in principle it's become easier to avoid this. But for whatever reason, I don't hear from my (admittedly very small and unscientific) survey of users that it's being done. So maybe Microsoft customers have priorities which just differ (in minor respects like backups, and remote admin on sites with thousands of users:-) from mine...

william newman
Monday, January 27, 2003

Hell, I know this is old, but the slickest I've seen is the back up and restore of a VMS system from TK50's (system of microVAX's). The tape itself was bootable and upon restore, you booted from the tape. It just layed everything down on the drive neat as you please. Those were old drives (even then), and for VMS the 3d party defrag utilities were too expensive for our department budget, so one nice effect of these restores was the restored drive was also defragged.

personally, I haven't had difficulty backing up data from windows boxes, but I've not had much luck doing real full backups of the entire system in windows. It's enough of a PITA that it ends up being faster to lay down a new image from Ghost and restoring the data. I'm no expert at it, certainly, so I may be doing things wrong, but this seems to match what I hear from other folks as well, so maybe it's not just me.

Monday, January 27, 2003

Windows 2000 is schizophrenic in terms of its security model which is one of the difficulties in doing incremental or content backups.

NT had something called 'SAM' (Security Accounts Manager) which encrypts the key for the user account.  Essentially this means that if a user is screwed, or you reinstall the OS, then you get a different SAM and that if even you create the same user name and details you get a different account.

Windows 2000 Active Directory changed that so that server accounts were handled on the server.

But those accounts are still verified against a SAM database on the local machine and on the server itself you have a local SAM.

So, you can't incrementally backup a Win 2K workstation (I guess XP is just the same), and guarantee the same access.  You have to image backup.

You can always, given the administrative rights, move rights so that new SAM accounts get access to the directories again but its always a pain and much more so when its the Server that went down.

Simon Lucy
Tuesday, January 28, 2003

I am trying to put together a Disaster Recovery plan. It is just about
finished but...
One scenario I would like to plan for is the worst case where we have a fire
and lose everything except the backup tapes that are stored off-site.
We have a mixed mode Win2k DC and two NT4 BDC's. The Exchange 5.5 mail
server is on one of the NT4 Servers.
Say all the hardware is destroyed - I re-create an NT4 Server PDC with new
hardware (from the insurance company)
I would have the SAM on an ERD disk, but my question is, how to get the user
accounts onto the new computer?

Monday, September 8, 2003

*  Recent Topics

*  Fog Creek Home