Fog Creek Software
Discussion Board




"Would it have prevented this" analysis

From the background check thread, I need to ask this - something I routinely ask when there's a knee-jerk reaction to a Bad Thing is "would this proposed post-horse-escape barn door closing have prevented the thing we're reacting to?"

I asked this of new security protocols after a breach, I was constantly asking it in the wake of 9/11 (not that anyone was checking with me), and relevant to this forum - I've asked it in software post-mortems.

There was a huge flail at Camel during an attempted deployment (turned out the app was still writing to the test database from the production system). The managers seemed to insist the biggest problem was that the deployment checklist wasn't ready for QA by the Wednesday before deployment. Mind you, the checklist *was* run and properly verified by everyone; it was just done late.

I kept asking the question like a broken record - "would this have stopped the problem from happening?" and got evil looks from management each time. *My* solution was that there should be an impermeable firewall between production and test - the install should've failed when it looked for the test database, not found it and chugged along happily writing to the wrong place. This suggestion was ignored in favor of a more thorough checklist.

Is "would this have prevented the problem" analysis flawed in some way? Shouldn't a post-mortem seek the solution to prevent future occurences instead of simply coming up with a process which (while perhaps a better process than the one in place) doesn't address the reason you're there?

Philo

Philo
Wednesday, July 30, 2003

Seems to me that finding the source of the problem and making it so the problem cannot happen again is certainly necessary. Totally segregating your development and test networks sounds like a great idea to me.

I think also having a checklist to detect if the problem exists would probably be good for redundancy purposes, just in case the networks somehow got bridged without anyone in your group knowing about it.

Dave
Wednesday, July 30, 2003

Think of the implementation as a program (it is, but it's executed partly by machines, partly by humans). Think of the firewall as an ASSERT statement - independent of any checklists, it will fail the process and alert you. Think of the checklist as reviewing your code again.
Will reviewing your code prevent bugs? Not all of them, but you might find some. Does reviewing your code take time? Yes.
Will ASSERT prevent bugs? Not all of them, but some might be prevented. Does executing an ASSERT take time? A few nanoseconds of processor time.
Is reviewing your code worth the effort? Yes.
Is placing asserts worth the effort? Even more so.

Alexander Chalucov (www.alexlechuck.com)
Wednesday, July 30, 2003

Philo,
I enjoy your posts here a lot.
I would suggest that your most politic move would have been to vocally buy-in to what mgmt said, since it obviously got on someone's nerves that the sacred checklist wasn't done in time. After that, figure out some way to follow-through by proposing a new rule for the post-mortems: that all post-mortem-inspired diktats be put to the "Would it have prevented this" test. Propose it in a face-saving, management-oriented channel, where someone can champion the idea without looking like they're challenging someone else's turf.
Always AND the suits, never OR them. Their egos are delicate.

Israel Orange
Wednesday, July 30, 2003

Likely, your repeated insistence on "Would it have prevented this?" made the extended-checklist promoter feel defensive and cling ever more tenaciously to their idea, so as to avoid having to back down in public. And lose face. To an engineer, no less.
Let them have their little victories, and then implement real, useful change behind the scenes. It sucks to have to use indirection because of someone else's ego's frailty, but that's the way of life in an office.

Israel Orange
Wednesday, July 30, 2003

My problem with the "checklist" mentality is that if the checklist is met, then management is happy.  Never mind the fact that there are other glaring security holes or visible bugs in the software.  Provided that running through the checklist results in all checks, then everyone is satisfied.

I find this most discouraging, as for me, the checklist is often specified by the customer.  Yes, we meet their limited requirements, but our product doesn't meet my own standards.  How does one deal with this, when management's attitude adheres to a "checklist"?

Elephant
Wednesday, July 30, 2003

Code isn't the solution to every problem.

pb
Wednesday, July 30, 2003

Anything that would have prevented this analysis would be great. 

I crack myself up
Wednesday, July 30, 2003

Perhaps the check list don’t need to be changed, but the test environment does.

If you had a separate test mule NOT connected to the existing network, then that test would failed.

So, perhaps a change in the test machine would be the solution...

Albert D. Kallal
Edmonton, Alberta Canada
kallal@msn.com
http://www.attcanada.net/~kallal.msn

Albert D. Kallal
Thursday, July 31, 2003

Its generally a Good Thing that test environments are entirely separate from any other network or system, that it also exposes Bad Things like this is just a bonus, but its not the reason for having separate test environments.

The real problem in the app writing to the test database is that there's a hard coded address in there.  That's a much more fundamental problem and one which implies that there are other assumptions and hard codings in the rest of the app. 

It tends to suggest that the app was built to succeed rather than developed to a design and set of standards.

Those kind of apps tend to be written by people that think a 'clean' compilation is a milestone, where clean means that warnings and compiler 'excesses' are ignored.

Simon Lucy
Thursday, July 31, 2003

Tend appears to be my word of the day, he says tendentiously.

Simon Lucy
Thursday, July 31, 2003

Elephant said: "Yes, we meet their limited requirements, but our product doesn't meet my own standards."

My answer would be: lower your standards (in most cases). Relatively few of us here are writing code to cure cancer, help orphans, or run the Mars mission; most of us are slaving away to improve some gigantic corporation's profit margin by 0.0001%.

If you offer a suggestion once, and it isn't taken; move on. It's Just A Job.

Mike
Thursday, July 31, 2003

Philo,

You are correct that something should be done to avoid the problem in the future.  But you need to be carefull to handle the real problem.

Was this problem of just a symptom of bigger problems like too much time pressure, lack of QA staff etc? 

In your case the managers most probably, understood checklists and did not understand firewalls so they wanted more checklists.  If the problem was a project out of control because of time pressure neither more checklists nor firewalls would handle it.


A separate point - there is a Japanese term Poke Yoke used in process improvement which refers to making errors either imposible (or at least difficult) or at least stoping the process when errors are detected.

One example of this idea is ribbon cable connectors that can only goto together one way.  Assertions are another way to do this

john
Thursday, July 31, 2003

*  Recent Topics

*  Fog Creek Home