Fog Creek Software
Discussion Board




"Not broke? Don't fix!" - I don't get it...

In the "Ask Joel" forum, I came across this ( http://discuss.fogcreek.com/newyork/default.asp?cmd=show&ixPost=6151&ixReplies=16 ):
"1b) installing a patch can break a working server. "If it ain't broke don't fix it" is one of the 10 Commandments of good system administration."

I don't get it.
For one, it *is* broke, that's why they sent out the patch int the first place.

Secondly, it's an accident waiting to happen: You also buy the insurance *before* your house burns down, and you use the seat belt *before* you rear end the 5 ton truck. If not, you're in deep didgery-doo when "something" hits the fan. Don't administrators believe in Murphys law?

And third, when "something" hits the fan, your bosses boss will ask questions. Your boss understands, he was a sysadmin himself some years ago, but your bosses boss is a suit, and he gets grumpy when his 10.000 customers a day get their credit card info stolen, and the press people start to call him with bugging questions about patch no. 19443, and the (lack of) competence in his company.

Fourth, when you install patches before "something" hits fan, you get to choose *when* to do it. When evil-cracker-virus-worm no. 873475 is spreading and swallowing your companys ressources, you have to patch now... well, 5 minutes ago, and not just 1 server, but all of them.

The above is my perception of a real sysadmins challenges - I'm not a sysadmin myself, so if I'm way of, just tell me about the real life of a sysadmin without too many harsh words. Thanks.

Martin A. Bøgelund
Thursday, May 13, 2004

Your view is a little too simplistic.  For one, you seem to assume that all patches do one and exactly one thing -- fix what they say they fix.  That is not true.

Also, "broke" doesn't apply to OS.  It applies to the entire system.  There is a difference between the OS being broken and the entire system you are in charge of being broken.  One does not imply the other.

If you have ever administered a complex system, you have a built-in conservatism.  You don't go installing patches willy-nilly because people tell you to.  You probably have dozens of products that can affect your system as well.  Keeping track of every little patch can be a real pain.

Roose
Thursday, May 13, 2004

I'm sure I read somewhere that 50% of bug fixes introduce a new bug. If that figure is accurate then I think its reasonable to be cautious about installing fixes and service packs.

Tony Edgecombe
Thursday, May 13, 2004

50%?

Are we talking all systems here, or just a subset (Windows)?

Martin A. Bøgelund
Thursday, May 13, 2004

Roose,

I can follow you as far as this:
System is broken doesn't necessarily mean that OS is broken. But
              OS broken => System broken,
non?

Martin A. Bøgelund
Thursday, May 13, 2004

Not installing a patch or upgrade is perfectly acceptable if you mitigate the problem in another way. Hell, a patch might even be for something you aren't even running.

Why would you change a working setup unnescessay? What is there to gain?

Just me (Sir to you)
Thursday, May 13, 2004

A lot of times you're not even running the services the patch is for.

If your customers credit card numbers are stored unencrypted, then you have bigger problems than the latest patch.

If the patch makes your system unworkable because the CPU is running at 100% you're messed. At least if you wait until the worm hits you, you can point out all the other companies in the same shit as you.

Stephen Jones
Thursday, May 13, 2004

Well, I can see your point of view alright. I'm been on both sides, sys admin and developer. I have gone through cases of patches BADLY breaking services, learned early to upgrade one at a time, anything goes wrong, pull the box effected and let the failover do it's work.

Now, from the sys admin point of view, what if $x-patch fixes the webserver and mailserver, but I use it as an application server using 3rd party software, then I'm in no rush to upgrade. I also switch off the services if i'm not using them. The one exception when I will update is for security fixes.

It's simple, every sys admin, even around for a year, with say even 20 servers, has probably seen a few patches break things, so they often hop on the side of caution.

As a developer, I know not all patches may be applied, it's a bitch, but there's not a lot I can do, so I have extensive versioning inside each modules and a script which will extract them, so in the event of a customer error, they run this, it'll check all the versions and some other things, which comes with the bug report.

fw
Thursday, May 13, 2004

I can see why you wouldn't patch something that you aren't running, but that's not what I'm talking about.

I'm talking about sysadmin X who says "I know we run this component and I know there is a patch available for certain bugs, but we wont patch until somebody reports having a problem that could be related to that specific bug/patch"

For what I know, this is not so rarely seen.

Take the vulnerabilities exploited lately. As soon as it's known what hit them and how, a lot of clever people pop up and say "Oh that vulnerability? I patched as soon as the patch was available - so these guys who were hit  really asked for it".
And nobody stands up for these non-patchers, saying "there was no problem until now, so why fix?"
It all diverts into "M$ can't make secure software" and "Linux exploit? Ha! You get what you pay for!"

So, why, oh why, is it common sense, nay, "one of the 10 commandments of good system administration", *not* to buckle up before after you crash?

Martin A. Bøgelund
Thursday, May 13, 2004

If the patch is for vulnerability of service X running on port 1433, and I have port 1433 blocked on a HW firewall right in front of the machine, why should I apply the patch? I might over time, in a lul period, maybe in a "defense in depth" effort or to have blocked port 1433 of the requirements list, but I am not going to rush like mad to get this deployed in 24 hours just for the sake of it.

Applying a patch is one of many possible responses to a vulnerability. that is why decent security bulletins have these "Mitigating factors" and "Workaround" sections. You select the solution witch fits your environment.

Just me (Sir to you)
Thursday, May 13, 2004

If it ain't broke  --  break it!!

My Cousin Vinniwashtharam
Thursday, May 13, 2004

Buckle up before you crash is a bad analogy. It is more a question of taking large chunks of the car apart, banging them into a different shape with a mallet, and then hoping they fit back together again. You can understand somebody saying he'd rather wait until the car has a prang before doing it.

And often that is the diplomatic way. I have for example given up sending self extracting zip files or Access mdb's by email because the different Outlook Security patches mean a quarter of the time or more they don't get delivered. Now imagine the attitude of a guy who finds he's lost a couple of days on the schedule because his sysadmin has blocked necessary files. He's not going to be happy. On the other hand, is the sysadmin can say - "Hey remember that virus that was in the news last week; well that's why we've had to change things."

Stephen Jones
Thursday, May 13, 2004

O hell (hell being my word of the day).

Did I ever tell you about how my nephew showed up and "upgraded" my wife's 7+ year old computer with every Microsoft, Intel, nVidia patch he could find?

It became as useful as a brick.  After recovering all her data from the hard-drive I bought her a refurb'd Dell notebook and the old PC now sits alone as a Samba/print server for the house (running redhat).  A really nice way to put an old PC out to pasture.

"Fixes" in software are often worse than the cure.

hoser
Thursday, May 13, 2004

Hoser,

at least now when your nephew drops buy he won't have any patches to sneak onto it

"Dear Red Hat Linux user;

In accordance with our errata support policy, our final Red Hat Linux distribution, Red Hat Linux 9, has now reached end-of-life for errata maintenance.  This means that as of May 1, 2004 we will not be producing new security, bugfix, or enhancement updates for this product.
...
"

:-)

Just me (Sir to you)
Thursday, May 13, 2004

Wow... maybe I'm way out of touch, but I think every large (i.e. 250+ Windows servers, 12+ midrange, mainframe sysplexes, etc...) IT shop follows this plan on patches:

-Establish some level of how out of date the shop is willing to be (by version level (i.e. n-1) or time frame (30 days)).
-Have multiple environments to be able to apply patches and version upgrades in (at least TEST, QA, and PRODUCTION).
-Roll patches through the different environments and do additional testing at each environment (IT testing in TEST, user testing in QA, etc...).
-When rolling into PRODUCTION, do it after hours and have a plan for rolling back.

I've seen plenty of small shops that do this as well. My background is financial services (where IT is the lifeblood) and "big" IT vendors, so maybe this isn't what other industries are doing. However, this addresses the concerns about incompatibility and keeps the shop in a state that vendors are comfortable supporting.

Mark L. Smith
Thursday, May 13, 2004

Martin, you said

"
System is broken doesn't necessarily mean that OS is broken. But
              OS broken => System broken,
"

No, I don't think either direction is true.  OS broken doesn't imply the system is broken because maybe the bug only occurs under some conditions that don't occur in your system.

Maybe there is a bug when you remote login with 1024 users or something.  So technically the OS broken.  Say your system only has 10 users, then your system is not broken, and you might be wise not to patch that problem.

Roose
Thursday, May 13, 2004

Roose,

If I was the sysadmin, I would still think of my system as broken when my OS was broken, even though our local usage of the system would never cause the broken feature to show.

If I slept in a broken car and never used it for anything else than that, then I would still sleep in a broken car. And if my friend would ask me if he could borrow my car, I would not assume that he wanted to use it the way I use it, and I would tell him that it is broken, although it was OK for my usage.

But I think I hear what you (all of you) are saying:
Systems are complex, and tend to be in a kind of solid state - introducing changes/patches might change this solid state in an unwanted way, so there has to be a good reason for the change.

Thanks for the answers.

Martin A. Bøgelund
Thursday, May 13, 2004

Just Sir (me to you),

Actually, it was RH 8 that I stuck on it.  Its not exposed to the external network, and I stuck the latest Samba release on it - since that's what it's purpose in life is right now.  There was one bug on the prior samba release which made windows boxen think that the server didn't know how to handle long file names.  Anyhow, that's fixed.

There are some things you just want to set up and never touch again.  I don't expect that I'll upgrade anything else on that box before it becomes scrap.

hoser
Thursday, May 13, 2004

Well you have a sense of Not broke = Perfect. Others have the feeling that Not broke = 90% fit for the job. (Probably lower.)

Peter Monsson
Thursday, May 13, 2004

Hoser,

I also am still on RedHat 9 with no intention to change.

Just me (Sir to you)
Friday, May 14, 2004

Martin,

according to that reasoning all your systems are perpetually broken, and in the best case you keep on finding out every day about just how broken they were the day before.

Just me (Sir to you)
Friday, May 14, 2004

Just me,

Yes!

If it was possible to make non-broken systems, we wouldn't have human sysadmins. We would have these perfect, non-broken systems to act as sysadmins, and the days of worms, crashes and bugs would be over.

But accepting a non-perfect world does not mean accepting sub-optimality, status quo, or even resignation. Not for me, at least.

One thing I've also learned from the answers posted, is that it is important to put the word "broken" in relation to the tasks the system must solve, and the degree of brokenness measured against the degree of stability the system has now.

So for me, "less broken" is still better than "more broken".

Martin A. Bøgelund
Friday, May 14, 2004

*  Recent Topics

*  Fog Creek Home