Fog Creek Software
Discussion Board




"Sometimes the server doesn't come back up"

A question for the sysadmins with more experience with me - in *my* experience as an SA, my #1 priority was always to guarantee the server would come back up properly on a cold start or restart (I've always handled colo's and don't feel like driving in)
If I had a box that "sometimes didn't come back up" then that goes straight to the top of the list and I bang on it until it does come back up, every time.

Have I just been lucky? Are there truly cases where a server may have an intermittent boot failure and you "just deal with it"?

Philo

Philo
Thursday, June 24, 2004

Personally, I have rarely had a 'co-located' server, so if I had to bang on it to get it to come up, it was not a HUGE deal. 

On the other hand, the hardware side of me wants to be able to start from scratch, and have a working system come up from power-on to application with no human interference. 

Plus I HATE those 3 AM phone calls where a lightning storm has reset power, and the server won't come up.

It is a small niche knowledge-set, dealing with setting programs as auto-start 'Services' on OS Startup.  But you do it for your web-server, so you just have to deal with the gotchas.

AllanL5
Thursday, June 24, 2004

By the way, I did have to deal once with a copy of Sybase Replication Server 10.5, which would only auto-start on a WIN-NT 3.51 platform.  So, as of 6/24/2004 we STILL have a working, running Win-NT 3.51 platform, just because that's the only place it will auto-start. 

Neat, huh?

AllanL5
Thursday, June 24, 2004

Well it depends on why it won't come back up and how much it would cost to fix it.

If it's a software issue like a drive or really unruly service, then yeah, I would beat on it until I got it working.  If it's an intermittant harware issue, then that's a whole nother bowl of cheese dip.

Currently, we have an old HP server that will refuse to go all the way past post 1 time in 10 after a soft reboot.  It actually has to power down.  It'd cost more than we're willing to pay to fix it, so sometimes we have to go push the button.  Then again, if it was a colo on the other side of town, we'd be in a bigger hurry to get it fixed.

Steve Barbour
Thursday, June 24, 2004

We had an incident a couple of weeks ago were the servers were down and not coming back up.  At first it seemed like a network issue because some machines were available and others weren't.  Someone had to drive in the middle of the night and found the problem right away.  The A/C for the room gave out and the machines were shutting down from the high heat.  The Sun boxes and high end IBM servers were smart enough to shut down for self preservation and in those circumstances they don't come back up with out a human.  However the Dell PC in the room never did and now reboots at random.

Now we have dual A/C's and you can just about keep your lunch in the server room instead of the frig!

Bill Rushmore
Thursday, June 24, 2004

It is a no brainer to set up your server to power back up and have services running.

The problem is balancing the need to be up and running automatically with good diagnostics and recovery.

Sometimes when a server goes down, it is better to have it stay down, than for it to come straight back up.

Tapiwa
Thursday, June 24, 2004

With some effort you could have turned that post into an haiku.

RP
Thursday, June 24, 2004

This is off the subject, but now that Joel has more than one server he'll start to see where Windows falls apart in the Data Center.  I've litterally spent the night in a data center trying to update Windows on 20 servers.  Makes you realize why firms like google don't run Windows. 

christopher baus (www.baus.net)
Thursday, June 24, 2004

Or, he might see that if properly managed acording to what Microsoft recommends, not your greybeard sr. IT staff, everything will be a-ok.

I've worked for several large companies that had either a 100% MS presence, or a very large %age MS presence.  I have never, not ever, seen the kinds of nightmare situations arise that I see MS blamed for here and else where... what's the deal?  Have I been lucky?  Surrounded by competent IT architects?  Blessed?  Worked with large enough companies such that it was budgeted so everything could be done correctly?

I'd almost like to have some sort of nightmare scenario go down (almost) just so I could understand the vitriol flying from sysadmins from time to time.

Almost... but not quite.

Greg Hurlman
Thursday, June 24, 2004

Ok well Microsoft recommends one admin per server then.  Yep.  You're right no problems.

christopher baus (www.baus.net)
Thursday, June 24, 2004

Here's a nightmare scenario.  Patch SQL server from two states away.

christopher baus (www.baus.net)
Thursday, June 24, 2004

Philo, Dell racks really do have pretty messed up remote control power cycles, it's a pretty common experience for many owners. If you wanted to work around it until it's a non-issue, one way is to buy a remote controlable power-cycle that DOES work.

overweightnerd
Thursday, June 24, 2004

Christopher, I maintained two SQL boxes one state away and two other SQL boxes 1,000 miles away (including patching them). That was for two years.

Being a sysadmin to seven Windows boxes (including ASP.Net, several custom .Net apps, SQL Server and Exchange) was really the least of my worries - took me an average of about two hours a week.

I kinda feel like Greg - my experience was so easy, maybe I was doing something wrong?

Philo

Philo
Thursday, June 24, 2004

What were using for remote access, and what was that user experience like?  For me it means setting up a VPN so I can administrate some 20+ window's boxes remotely using VNC. 

Almost every patch from uSoft requries a reboot, and this always makes me nervous.  I don't even want to into why this is necessary, but once you've done "apt-get update" you'll see that it doesn't have to be this way.  It doesn't matter that is no fancy rendered button that says "update."

Personally I think leaving administrators no way to manage a box without a locally running UI is a significant flaw in Windows.  Windows is the only major network product that doesn't support SSH out of the box.  When I'm working on slow or unreliable links, the last thing I want is the overhead of sending the UI down to my client.  I guess we have Steve Ballmer to thank for that. 

christopher baus (www.baus.net)
Thursday, June 24, 2004

windows has all sorts of remote management tools. example: go into the event viewer. pick Connect to Another computer. voila, no UI being sent, just data.

Of course, the server still tend to flake out for unexplained reasons and usually easier to use remote desktop.

mb
Thursday, June 24, 2004

*  Recent Topics

*  Fog Creek Home