Fog Creek Software
Discussion Board




The Perfect Code

I just came across a fascinating article from 1996 on NASA's "On-Board Shuttle Group" (they write the software that controls the space shuttles).  In light of recent events I thought you might find it interesting too....

"This software never crashes. It never needs to be re-booted. This software is bug-free. It is perfect, as perfect as human beings have achieved. Consider these stats: the last three versions of the program -- each 420,000 lines long -- had just one error each. The last 11 versions of this software had a total of 17 errors. Commercial programs of equivalent complexity would have 5,000 errors."

"And no coder changes a single line of code without specs carefully outlining the change. Take the upgrade of the software to permit the shuttle to navigate with Global Positioning Satellites, a change that involves just 1.5% of the program, or 6,366 lines of code. The specs for that one change run 2,500 pages, a volume thicker than a phone book. The specs for the current program fill 30 volumes and run 40,000 pages."

http://www.fastcompany.com/online/06/writestuff.html

Chi Lambda
Tuesday, February 04, 2003

And id be thrilled if I even saw a half page spec at my company... Maybe that is why we have 1250 bugs in the database from a little over 100000 lines of VFP6 code

Me
Tuesday, February 04, 2003

Same year (96) the Ariane 5 missed its first flight. Apparently this was caused by software failure which was caused by inconsistent specifications.

http://www.rvs.uni-bielefeld.de/publications/Incidents/DOCS/Research/Rvs/Misc/Additional/Reports/ariane.html


The guidance system for Ariane 5 was an upgrade from Ariane 4. But Ariane 5 had a different trajectory which pushed one of the numerical parameters out of bounds.

This would have been caught in testing, but only if the testing specification were properly upgraded to work for Araine 5. That didn't happen although the specifications and development processes are at the same level as NASA uses. They still failed.

IMO this points out there is a limitation on how much we can know and control things - maybe in the form:

(# Errors in code) * (# Errors in Specs) >= const

where const depends on the process and is always a positive real.

Meaning ...

One cannot pin down both the errors in the code and the errors in the specs at the same time.

Good code requires uncontrolable huge specs.
Simple specs are too vague resulting in (at best) code with usability problems (not doing what it should)

If the above is true, then our development process would have to deal with these intrinsic uncertainties.

Cheers
Dino

Dino
Tuesday, February 04, 2003

Great read - thanks.

John Topley
Tuesday, February 04, 2003

I just like this metric :

bug / code line number

Because it cannot be a metric (let's call it a statistic), it has no sense to say one bug for 480 000 code lines. You could say 10 or 50 it would be exactly the same.

How do you define a bug anyway ?

What I'd like to compare is the amout of time (in europe, we're shy about money) needed to produce this code, including specs.

The whole Methodology, which must look like Maxi XP, should also be very interesting.

Ralph Chaléon
Tuesday, February 04, 2003

One bug really means, "One bug found."

The operative word being "found."

Whenever someone asserts that their code is so good, it only had 17 bugs, one has to ask, "how can you tell?"

Obviously, you can't. And then the question must be asked. Did you only find 17 bugs because your code is so good that even if there are more, they must be so obscure that you wouldn't find them under all the circumstances we tested.
Or, did you only find 17 bugs because you didn't really look all that well.

I expect for any project that involves the lifes of people so absolutely as the space program does, the testing is exhaustive.
But don't ever mistake that for certainty about there not being any bugs.

Practical Geezer
Tuesday, February 04, 2003

I had a professor in college who used to work at NASA and was a specialist in "Mission Critical" code.  It was cool becuase she knew a lot and taught us alot about how not to code bugs to begin with.  The problem was, if you had one "bug" (insert your own definition here, she sure as heck did), you automatically were down to a B and so on and so forth.  It was brutal.

Matt Watson
Tuesday, February 04, 2003

Of course they fail to mention it cost the cost and time it took. I'll write you a perfect application. Just give me 1 billion dollars and 10 years to perfect it.

It's all relative. The fact is the perfect application is not cost effective in today's commercial software world. But with improved tools and more industry experience it will naturally become better over time, like any industry.

Ian Stallings
Tuesday, February 04, 2003

Practical Geezer,

there is a way to estimate the number of bugs left in the code: is called statistical sampling.

Typical statistical sampling problem is: "what is the maximum number D of defects I can allow in a sample of X out of Y items, such that the total number of defects does not exceed T +/- err"

Here is how it works:
1) pick up good working code and seed it with bugs. Use known bug patterns.
2) get people to review the code
3) get the code through testing
4) see how many bugs are still there

You can extrapolate based on how many bugs were detected in code reviews and testing, how many bugs are still left in the code within an error margin.

I've seen this done once on a project and the number of bugs in support did not exceed the estimated number of bugs left.

Cheers
Dino

Dino
Tuesday, February 04, 2003

Dino,

I know, but it does not give you any kind of certainty.
Especially since the bugs in your code are not at all distributed according to the nice laws of nature that make statistical analysis worthwhile.
I am not a statistician, but I remember from "way back" that statistical analysis is not something to be taken lightly. You have to know quite a lot about what you're analysing and the population that you analyse must adhere to certain rules.

I am not saying, "don't use statistical analyses to get a feel about things," just that you have to be very careful about what you conclude.

As you know, there are lies, damned lies, and statistics.

But the bottom line always is, even if the chance of there being a bug left is minute, there is no way to formulate positive proof that it isn't there. Save anything so trivial that it is not worth mentioning :-)

Practical Geezer
Tuesday, February 04, 2003

Joel mentioned the same article in one of his writings.

Prakash S
Tuesday, February 04, 2003

Each Engineer would have his own distribution of bugs!

Daniel Shchyokin
Tuesday, February 04, 2003

I like the idea of a course where you drop a grade for every bug found in your code. Most coders (including myself) have never written code debugged to that level, so it's really hard to estimate the amount of effort involved. Even if you never have to code 'maximum reliability' code, some experience of working at the extremes would give you a much better feel of what is required to make code more relaible (or how much effort you can avoid if your code does not have to be so reliable).

David Clayworth
Tuesday, February 04, 2003

I have written code to a pretty high level of perfection.  Not to a space-shuttle level of perfection, but very good for a commercial software package.  It's a small library that gets a lot of use and it has been at least a year since there was a bug found in it.  I'm very proud of it, so you'll have to excuse the boasting.

There is no spec, just a lot of experimentation and mental models in my head.  Since it's just me and I spent a lot of time researching and measuring the problem, all I really needed to do was document it after development was done.  (I have no doubt that if I was coding around something that I didn't have a firm grasp on, I'd need multi-hundred page specifications)

It has a fair load of assertions to make sure that everything is working correctly.  Any potential input, no matter how unusual, is handled in a correct fashion.  There are two test suites that exercise just about everything in the code.  Once is test cases for boundry conditions that I wrote and the other one is sample data.  I make sure both will run correctly before I'm comfortable with the code.

This piece of code was also developed with enough of an eye towards the future and enough simplicity that it hasn't needed to be changed, which helps.

It's actually quite nice to have a few pieces of code in a project that are so stable that you can assume them to be correct while debugging a problem in that code path. 

w.h.
Tuesday, February 04, 2003

Last thought -- Process is not necessarily everything.  There is not one correct process, there are many different ways to do the process that ensure that the result is correct.  This is the flaw with SEI's CMM, IMHO.

w.h.
Tuesday, February 04, 2003

I did a course at university about Formal Methods, this was mostly to do with a formal specification language called "Z". The idea was that all functions (in the very general sense) were specified using pure mathematics with pre-conditions and post-conditions. What was nice was that there existed checking programs that could take this notation and check it for completeness. Z is not easy to use, and the step between it and code is great, but it can be used as a firm basis from which later development can be referred back to.

I believe that IBM used Z to reverse engineer CICS from its assembler base (pre 2.0 I think) to a new set of specifications for the C versions, the idea being that the functionality was exactly eqivalent and that any holes in the original assembler code were filled.

Using such formal methods is extremely time consuming and you need mathematicians rather than developers to write and use it. It really only makes sense in life or death or other situations. What was very impressive was that our lecturer stood at the front of the class and said that with such methods you could *prove* that your specification was *correct* in the strictest sense. Its a mighty impressive thing to be able to claim. The problem is then shifted onto compilers and the later stages of development.

WhatTimeIsItEccles
Tuesday, February 04, 2003

I don't really believe in these systems in which you "mathematically" prove the correctness of code. In real life the mathematical proofs are so difficult and complex, you're just as likely to make a mistake in the proof itself. And in reality that's what happens.

Joel Spolsky
Tuesday, February 04, 2003

I wasn't referring to proving the correctness of the code, I agree that is a difficult thing to do. With Z you can prove the correctness of the specification and if you have such firm foundations then it can make for a very strong edifice indeed. The beauty with Z and other systems like it is that there exist programs that do the checking for you.

I think the "halting problem" is used as a proof that you cannot have a program that can check other programs for correctness/halting, at least not in the general case.

WhatTimeIsItEccles
Tuesday, February 04, 2003

"This software never crashes. It never needs to be re-booted. This software is bug-free. It is perfect, as perfect as human beings have achieved. "

This may be slightly off topic, but one of my arguments against the strategic defense initiative is that the requirements for that program are too complex to be implemented without a lot of bugs.  If the shuttle's software is so good, as this poster indicates, would it be possible to make SDI software just as reliable?  I hope I'm not comparing apples and oranges.

Peter H
Tuesday, February 04, 2003

Practical Geezer,

At the risk at beating up an expired equine, here are a few thoughts on statistics and software.

"There are lies, damned lies, and statistics" ... that's absolutely true. One must always question the assumptions used for a statistic:

First of all, software bugs are Weibull distributed.

On a note: Poisson (the distribution of indepent events) is a particular case of the Weibull distribution. If there is a memory like type of effect from one event to the next one, the distribution is Weibull.

This is somewhat natural since bugs are accidents and after that there is a process of "bug interference" where some bugs recombine in normal behavior (two wrongs make a right!) or in skewed problems (they create bigger problems than they should).

And contrary to the common sense, the distribution would not change with who the programmers are, but only some shape parameters.

Nevertheless, with a known distribution one can derive certain facts like - assuming we know how efficient the QA is - the distribution of non-detected bugs. With a known distribution one can calculate probabilities - for example "what is the probability to have 5 bugs left in the code?", or "what is the average number of bugs left in the code and how big the deviation from this average?"

This is statistics, it is measurable and the values would help determine "how many people we need in support?". In a way, it is a rigurous way to deal with uncertainties if applied correctly.

Seeding bugs in the code would give us a measure of how efficient is the QA. Of course, bug detection has its own statistical distribution and this may complicate things. But that, not now ...

Cheers
Dino

Dino
Wednesday, February 05, 2003

Dino,

Thanks for the extra information.

Just to be sure, I acknowledge the points you make and never meant to dispute them. Statistics give you a good idea about how much extra testing you may need, how much support, or anything else that might be based on the quality of your product.
So in that way it is very useful.

However, I have seen many take probabilities for certainties and base totally invalid conclusions on totally valid statistics. Hence the word of caution about statistics. Obviously, your not in my target audience :-)

Practical Geezer
Wednesday, February 05, 2003

Correction, "you're not".

Practical Geezer
Wednesday, February 05, 2003

*  Recent Topics

*  Fog Creek Home