Fog Creek Software
Discussion Board

"Murphy had other plans"

Can't help but read the latest Joel article and wonder if perhaps The Man might be drinking the purple kool-aid.

For ever and ever we get directives from Joel Central about usability and general robustness of both code and overall solution.  Now, a few days after the Latest Product ships and the online demo apparently falls over, The Boss explains that maybe it wasn't cost efficient to plan for / fix those demo bugs anyway, and by the way, do you know that our own in-house tools are buggy too and there's a good reason (cost) for that also?

Sound like a bean counter talking here.  Sounds like a bean counter who might be losing his ass.

Welcome to Reality 101.

Agnus Moorehead
Saturday, November 9, 2002

Economics 101:

Run a company so that you can make money, the only way you can make money is if your customers are happy with your product.

In the case of internal fog creek applications, the customers are the employees at fog creek, and they sure do know how to work around it, etc. Hence, it does not make sense to fix those things when there are other more important things to be done.

I assume they (fog creek) work off a list which shows things to be done and its priority, this figures pretty low in that.

Prakash S
Sunday, November 10, 2002

Nonsense. If you own a company, better learn how to count beans.

I can't say I agree completely with this particular case. Perhaps 8 days would have been better than 4 if you take into account the potential damage to your image (ie, "we tried to use FogBUGZ once, but their demo site was down, and I figured that if even they can't make it run, better try other solution"). Joel talked about it (the IBM imaginary customer).

On the other hand, OF COURSE internal tools have bugs, and better don't lose time fixing them. I can't imagine why a tool meant to be used in-house have to be perfect.

Leonardo Herrera
Sunday, November 10, 2002

Both you and Joel strike me as kinda nuts in this case.  On one hand, a lot of good consultants I know make these little calculations, as 2nd nature.

On the other, I think that little square, especially the 8 days figure, was pulled from someone's ass.  You can't weight that equally. 

But is 1 hr downtime so bad?  Many bad software people don't even try making a reasoned decision.

Sunday, November 10, 2002

After reading it I have the impression it was rationalizing a screw up. 

While I don't disagree that sometimes making a decision not to test for economic or other reasons may be the appropriate thing to do, not planning for the probability of that decision is entirely inappropriate - such as not keeping a closer eye on the status of the loading.

Joe AA
Sunday, November 10, 2002

Actually, Joel's article demonstrates pretty common behavior for successful software products.

This behavior is pretty common. The trial site actually became customer facing software that needs to have the testing, etc. that you want in the product because it does effect the impression that you leave.  There is an underlying principle that as your software becomes more successful, there are things that used to take a small amount of time or no time at all that now soak up a lot of time and energy.  One of the arts of successful management is anticipating which of these things deserve your time and energy and which still don't.  You need to make the kinds of calculations that Joel talks about being sensitive to the impression you leave.

Joel's article gives another example as well.  There is no way that somebody who was being successful would assume that the week or so after a release would be relaxing.  There may be a lag, but there is always a big spike in activity that requires customer support right after a release.

Just other differences between the two worlds.

Sunday, November 10, 2002

"load testing is a no-brainer"

Well, maybe not load testing, but preparing for the outlying conditions certainly is.

I learned a long time ago that it's worth spending a little more time to make sure something can scale.

Sunday, November 10, 2002

Should Joel have tested for 10,000 simulations (or however many it was) hits to his demo site? Today we can sit back and say "sure, what idiot wouldn't?" but that is extremely unfair. You can only say that because you have substantially better information today than Joel did when the demo went live.

What if the total number of demo users over the life cycle of 2.0 was only 1,000? Would you then reasonably think that on the day you release 3.0 that you demo requests would jump 10x the previous volume? Anyone who could predict that should become a fortuneteller.

Without knowing the total number of requests you are about to receive, how could you define the load you are testing for? Sure, he could do some market testing to predict the interest of his demographic. I sure he has the $100k sitting in a cigar box to do it. :-)

Also, don't forget that this demo has been up and running for some time now without problem. Its unlikely that he considered the existing design (that has function so well) would be fatally flawed.

Sunday, November 10, 2002

This is my own little table.

COST              do load testing don't load test
server OK        1 day                0 days
server NOT OK 5 days              4 days

I don't know of anybody who takes 4 days to load test.  I've worked in my application space for long enough now that I pretty much know just by looking at some code or system whether or not it will scale.  On those that I don't I usually load test as I go, making it a part of my unit tests. 

The private database scheme sounds like an idea that definitely doesn't scale, and that's when I think I would have nipped it in the bud. 

Sunday, November 10, 2002

Joel's math regarding making decisions is either totally baked, or he's not selling any software.

An engineer day, for a load tester, is worth about $1000, MAX. Even in NYC, a "load testing engineer" is only worth about $70K a year, which is about $300/day. Add in overhead, and you get close to $1000/day.

So, even 4 days of load testing, is only $4000. Which is the cost of 2 unlimited fogBUGZ site licenses, or about five 20-user licences, or 20 5-user licenses, or 4 aeron chairs, or one workstation with a huge monitor, or a supersized number 2 meal at McD's for the development team every day, for a year.

No load tester should take 4 days anyway, more like 4-8 hours. This latest column was very dissappointing.

Sunday, November 10, 2002

btw, let's compare fogcreek with the real alternative.  We're putting them on a pedestal, comparing them with ninjas.  Instead let's ask what a bad company does.

A bad company, like the one I'm at, makes schedules that are unbelievably optimistic.  The head IT guy has been maneuvered out of real power because he had 4% less stock, and we have no idea how much time it actually takes to do any task.  We overbudget for capacity insanely, so our ISP and hardware vendors laugh to the bank.  Except that we misused the lower-salaried people, so no one actually loadtested in any meaningful way.  So when problems visit, many people touch the product, and the "engineer-days" pile up.

Fogcreek does not ever have to worry about playing in this league.  Any mistakes they make will have relatively light consequences.

Allocating hours seems like an interesting art.  There's someone in my guest room right now estimating hours for a proposal.  Allocating time accurately is usually much better than having a great "top speed."  Four engineer-days seems really reasonable if you're allocating the only time that matters -- the amount you can't directly bill for.

Sunday, November 10, 2002

If that last sentence sounded strange, I had to run. ;)

Sunday, November 10, 2002

the issue is that Joel did not make his engineer hours table BEFORE the server blew up. it would have been a refreshing breath of integrity to see a story about "whoops, we didn't think to load test the server - sorry!"  rather than conjuring up after-the-fact business analysis about why the releasing a broken server really just illustrates fog creek's superior business acumen.

Sunday, November 10, 2002

Maybe that's his way to say "sorry" :-)

Leonardo Herrera
Sunday, November 10, 2002

"the issue is that Joel did not make his engineer hours table BEFORE the server blew up"

And you know this how?

It should also be asked, do you know of any other company that would host a discussion about their "lack of credibility"? Most would have shut this down and deleted the thread a long time ago. That alone says a lot about Joel and his company.

And load testing or not, I'll be purchasing FogBugz for my development team. They make a good product. That says even more about Joel and his company.

Sunday, November 10, 2002

Joel's after the fact rationalizaiton is pretty below what I had come to expect from Joel.  This post has sadly knocked him down a few notches in my mind.

If this was financial software several million dollars could have been lost.  No, it's not financial software and yes you can do a cost - benefit analysis to decide if it's important to do the testing.  All though I don't really wish ill on anybody it would certainly serve him right if he did lose some big sales while it was down for giving us his lame after the fact rationalization.

Come on Joel, you're better than this.  We excepted you to says "oops, we messed up and learned this lesson"  Not, "I'm not going to say we messed up because I can come up with some mealy mouthed weasel out excuse"

Where can I sign into Joel's personal FogBUGZ and enter the bug report "Bug#73:Joel does not admit his mistakes".

Gregg Tavares
Monday, November 11, 2002

Come on Joel, you're better than this.  We excepted you to says "oops, we messed up and learned this lesson"  Not, "I'm not going to say we messed up because I can come up with some mealy mouthed weasel out excuse"
---------------------------------------------------------------- Gregg

He did say 'We learned this lesson.'

The lesson was that they don't need to worry too much about load testing.

Ged Byrne
Monday, November 11, 2002

I fully agree with Joel here.  It's about risk vs return, and in this case there was a small risk, and little probably impact on income so the money was not worth spending.

However for bug you ned to ask yourself how many extra copies you are going to sell by fixing that bug.

I've seen too many companies that don't do this anaylsis when looking at their bug list.  You have to ask yourself how many copies extra you will sell if you fix the bug. In many cases the answer is none at all, so why would you even consider fixing it?

Similarly I doubt that the cost of doing this testing in advance would have lead to a single extra sale of the software so why do it?

Monday, November 11, 2002

Can I just say:

He's said it before, why does it disappoint you now?

Mr Jack
Monday, November 11, 2002

The entire 'Joel offering' has certain values - appropriate functionality, good robustness and high quality at its core (my reading anyway). The FogBugz trial should have demonstrated those same values.

Joel's table looks at "Cost" purely in terms of engineering time. A decent risk assessment for Fog Creek Software should have included a column for cost in terms of credibility and lost sales as a result of the server not coping.

A bit of hubris does us all good once in a while - I'm sure my time will come (as will yours!), though I'm hoping it doesn't get the exposure that this one has!

Mark Smith
Monday, November 11, 2002

Here's the best litmus test:

Do you REALLY think he won't load test the next big thing he releases?

My money is betting that he will.  He may even draw a little box to justify it.

Monday, November 11, 2002

Joel just illustrated a savvy management technique, CYA (Cover Your Ass).  It goes something like this.

1. Drop the ball or screw up somehow
2. Defer the blame -or- apply heavy duty spin doctoring to justify the "decisions"
3. Declare that instead of being blamed you should be commended for saving the company money by being so smart.

Brilliant.  Please don't try this unless you have several years of management experience already.

"And in the long run we scientists will win."

Oh please.  For one thing, this is basic web site stuff here, you're not sending people to the moon.  Also, scientists and engineers solve problems or prevent them from happening in the first place.  Bean counting managers are the ones doing the fancy economic analysis.

"Hello Mr. Customer (or Potential Customer).  Yes I know the website is down.  We thought we might have a problem with it, but after a thorough analysis we decided we had more important things to do than being proactive and fixing it up front.  I hope this doesn't inconvenience you.  But hey, the good news is that we're much better when it comes to the products we sell!"

Monday, November 11, 2002

Hm. That's being picky about quality. Quality is an important thing, but -against the general belief- not the most important thing.

Leonardo Herrera
Monday, November 11, 2002

Well, let's suppose that quality is extremely important.  Even then, we're talking about tradeoffs.  Spending your resources on one thing may have meant no time for quality on another.

But who knows fogcreek's situation?  Fogcreek doesn't have enough transparency so that anyone can make an intelligent judgment on whether they did the right thing /for their situation/.

Monday, November 11, 2002

I am actually one of the persons that experienced the temporary glitch and I can tell you that I was EXTREMELY disappointed. Not necessarily because the glitch happened... such things happen (even if you do load testing) so they are somewhat “acceptable”... what I cannot understand is not investing the minimal effort in anticipating that failure might happen (especially since you KNOW you didn’t load test) and prepare to handle that graciously. I mean ... does anybody think that in order to prepare a customized error page takes 4 days? One that says “Sorry, we’re down... the appropriate mail will be sent to our engineers” and then soon after that another page that says “We’re temporarily offline! We are sorry for the inconvenience”. Did Fog Greek lost sales during that period? You bet it did! And another thing ... instead of saying “Guys, we’re sorry... we fixed the problem. And we did that and that to minimize the chance of happening” after the event we get “explanations”... disappointing....

Monday, November 11, 2002

A while back, there was a problem with For Creek's mail server, it was sending out the same email, quite a few times.

I sent Joel an email about this, and he fixed it, responded to my email, all in less than 30 minutes (if my memory serves me correct.)

You have another thought coming, if you think he is not proactive, or does not take his customers or people who visit this discussion forum seriously.

Prakash S
Monday, November 11, 2002

“Guys, we’re sorry..."

Corporations are like prosecutors.  They never admit mistakes, and certainly never apologize.

Anon, also
Monday, November 11, 2002

Joel *chose* to not be proactive.  The whole point of his article was to explain why he thought it made good business sense to do so.  He obviously thought the opportunity and actual costs of researching and fixing a (potential) problem was too high.  He was willing to live with the resulting ramifications.  That's fine.  It's his company and he knows alot more about it's internal issues than anyone here.

But, given what information he has provided, I disagree with what he did.  It just smacks of somebody dropping the ball and then trying to save face.  I certainly could be wrong, but hey, this is what this forum is all about, right?

Monday, November 11, 2002

When I read his latest, my first thought was:

"You screwed up and now you are trying to explain it away by showing how smart you are by doing a cost-benefit analysis."

And then I thought about it...and thought about it...and I still haven't left my original conclusion.

Joel, we love the site, we wouldn't keep coming here if you didn't have a lot of good things to offer. But c'mon...You goofed and that's ok. Don't try to pass it off like it was a good thing you didn't do load testing.

I find it hard to swallow that it would have taken that long to do some additional load testing. I also think you are minimizing lost revenue because of the outage. Rather, I think you are fudging numbers to rationalize a mistake.

Another Anonymous Coward
Monday, November 11, 2002

Hey, Another Anonymous Coward --

I think you're thinking too much!

I think Joel just realized that maybe conventional wisdom ("I should have load tested ... this wouldn't have happened if I had load tested") is wrong -- as evidenced by his cost-benefit analysis.  You can dispute his cost-benefit analysis, of course, but let's assume for the sake of argument that it is sound.

I don't see this as Joel's attempt to shore up his ego after he goofed.  I see this as Joel passing on an interesting insight gained from experience.

I doubt he's so insecure that one small problem affecting a few users would send him scrambling for justification.

Monday, November 11, 2002

Guys, keep in mind that most of us *wouldn't know* about the server glitch if Joel hadn't posted his musings. So it can hardly be classified as "You screwed up and now you are trying to explain it away by showing how smart you are by doing a cost-benefit analysis."

Monday, November 11, 2002

I sincerely hope (and assume) that Joel consulted with the poor bastards ...errr engineers who got the nod to do the on the fly, demo's down, fix it now re-architecting.

Many managers would tend to think of 4 man days before are the same 4 man days after. After all, the table says so, doesn't it? As a worker bee, experience says its really 4 carefully considered 8 hour man days BEFORE vs. 4 stress filled, no fun, (dare I say it, more than 8 hour man days) after.

I can easily see a manager asking the team:
Q1: How long would it take to stress test the system.
A1: About 4 days.
Q2: How long would it take to re-architect via plan x?
A2: About 4 days.

Without every asking them what they thought about, or even mentioning the trade off going through his/her mind. Don't do this. The more say, and the more warning the grunts have on those little trade-off decisions, the happier they and you'll be.

There's a big difference (in my mind anyway) between 4 hellish days a manager unleashed on me against my better judgement and 4 hellish days I knew about as a possibility with my better judgement. There's a world of difference, and I'm sure Joel handled it perfectly, I just wish he had formalized it in his article.

Phil Aaronson
Monday, November 11, 2002

Apart from Joel's mea culpa I thought the article highlighted a grim part of the business: For most software the job is never finished.

Human nature leads us to hope that you can relax a little after making the release date - after months of "dreaming in code."  It rarely works like that.

Tuesday, November 12, 2002

*  Recent Topics

*  Fog Creek Home