Fog Creek Software
Discussion Board




CNN Article on buggy software and it's cost.

Given the recent thread on software engineering and the future of our industry, I thought this article on CNN was interesting.

http://www.cnn.com/2003/TECH/ptech/04/27/buggy.software.ap/index.html

Nothing really new in the article, except that it's not too often you see this kind of stuff in the mainstream press unless there has been some recent catastrophic software failure.

Mark Hoffman
Sunday, April 27, 2003

Software is often buggy because it's developed using "cut to the bone" budgets. Software is very expensive to develop and when a manager asks you to get it on a tight schedule testing is often the first thing to be cut from the project.

Matthew Lock
Sunday, April 27, 2003

My theory is that software doesn't "look" like anything.  If you make something really small, really large, or really intricate, people respect it.  But with software, it looks all free, as if suddenly a computer's resources are limitless.

Most people don't live in the virtual reality, so it's understandable.

Tj
Sunday, April 27, 2003

That article is just Watts Humphrey stirring up some publicity and promoting heavy process over business realities.

Regarding the dishwasher,

1. The washer was defective and the bozo who bought it should have taken the washer back to the store and demanded a refund plus a credit for installing another one to compensate him for his time lost. It's his own fault for not demanding a refund and calling a press conference instead is just silly.

2. What kind of bozo buys a computer driven internet enabled washing machine/refrigerator/toaster/blender anyway? All these mechanical devices are rugged hardy devices that work better without no stinkin computer.

Regarding the Mars Lander, as I understand it the software is thought to have turned off the engines because it mistakenly thought the ground had been struck, which was detected by some sort of impact sensor that was triggered mistakenly when some other system deployed. This isn't a software bug but a design error and possibly one that may not have been found no matter how much time was spent in design, without the foreknowledge that the failure had occured. In no case would additional software testing have done anything to prevent the problem from happening and more than debugging a OS kernel will do anything to fix the problem of the CPU overheating because the case is improperly vented.

I have more to rant about but I will stop now and go get something to eat.

Dennis Atkins
Sunday, April 27, 2003

Well, if it's from CNN.

It must be true.

Mike
Monday, April 28, 2003

Software Engineering Licensing would have prevented all of these problems.

   
Monday, April 28, 2003

> Software Engineering Licensing would have prevented all of these problems.

Yep. The companies involved would be able to escape blame completely by pointing to failures of the licencing system.

e
Monday, April 28, 2003

Given all of the replies on this thread, and on the one about engineering licensing, I have to ask:

Does anyone here write buggy code?

I ask because most of the discussions here blame management or other people for software problems.

"It's not my fault. Management is full of idiots."

"It's not my fault. The user's are just stupid."

"It's not my fault. The system wasn't designed to do that."

"It's not my fault. The original developers were idiots."

Coder X
Monday, April 28, 2003

Reality #1:  Quality is not free.  Other than the US government, no one can stay in business on 3 lines of code per day.  [They almost sounded proud of this number.]

In many cases people are unwilling to pay for better quality.  Perhaps this is a little extreme, but how much would you pay to have a car that was guaranteed to run for 1,000,000 miles?  So now you can add $85,000 to the price of the car.  What?  You said you wanted a quality car and now you won't pay?  What people want is "free quality" and we can certainly get better at providing it.  However, it will continue to be a balance that will help some companies (and developers) succeed, while others fail. 

Mike Gamerland
Monday, April 28, 2003

There is a time honored equation that many programmers and managements work very hard to disregard: Fast, cheap or high quality - pick any two.

Actually, I just glanced at the article and I was pleasantly surprised. I was expecting the usual condescending business publication viewpoint of "blame the stupid immature dumbasses that executives pay good money to, who develop crap because they haven't all been outsourced to India yet as they deserve to be."

Instead the article pointed the blame squarely at the main causation in its summary: consumer demand for glitz and non-core features over basic & essential functionality which sacrifices stability for features.

This problem percolates all the way up the software food chain from consumers to developers. Companies feel that delivery of features is vital in order to maintain their market share so they flog developers to push new features out.  A developer hardly has time to groom existing code for quality without being dogged to invent some new bleeding edge function.
This is only one aspect of a very complex situation, tho. Two other really big factors at work include the tendency of this industry to compulsively re-invent itself every 5 years, and to shitcan stable but "not cool enough" tools - much of this industry (I dunno, 40%?) is continually devoted to platform specific rewrites in some form.  (But to be fair, this probably arises from the same syndrome that drives featuritis.)


And also, the lack of "best practices" that scale to different sizes of development organization. The space shuttle code is stringently reviewed line by line; commercial companies simply don't have that luxury.


I also have an intuition  that most of this industry's quality problems ultimately arise from the flaky prima donna nature of all programmers everywhere who endlessly want to dick with things that "just work". "Code stability" to most programmers implies "no more fun exploration to do". It is probably this basic dissatisfaction with status quo "process" that gives marketing departments at tech companies the ammo to keep driving up the feature level of products.


In fact, the extremely strong programmer aversion to any sort of stability is reflected in our emotional semantics: anyone who deals with old but working code is assumed to be a loser who won't keep up. Last year's vogue language is regarded as today's septic tank sludge. Geek culture tends to punish  those who want to develop quality slowly at a natural pace.

Bored Bystander
Monday, April 28, 2003

> I also have an intuition that most of this industry's
> quality problems ultimately arise from the flaky prima
> donna nature of all programmers everywhere who
> endlessly want to dick with things that "just work".

I have often seen the related problem of managers forbidding programmers to touch code that only allegedly works. The managers may bless workarounds and "temporary hacks," but won't acknowledge that a portion of the program is vulnerable and needs reconsideration.

While poor programming can be to blame, most often it's changes in requirements without corresponding changes in approved programming effort that lead to this defect situation. The programmers see the problem; they suffer the incessant bug reports. They can take the blame and grow to hate their work, or they can choose surreptitious heroics by way of insubordination. It's a fine mess.

Steven E. Harris
Monday, April 28, 2003

Dennis,

Are you suggesting that just because software runs something as silly as an Internet connected toaster, that it's owner don't deserve to have quality software running the device?

Should a dishwasher be connected to the Internet? Who knows...

Should a dishwasher have bug free, quality software running it? Absolutely.

Go Linux Go!
Monday, April 28, 2003

Buggy software is part of our learning process - if there is no error there is no learning any more. I we have so much to learn to build up software systems we will produce buggy software.

Georg
Monday, April 28, 2003

Looks like I wasn't clear in what I was saying and it was misunderstood. Sorry.

Yes, if you buy a dishwasher. It should work. It should not crash.

If a consumer buys a stupid piece of nonworking junk, they need to take it back to the store and raise unholy hell untill they get a refund plus a little something extra. I said he was a bozo because he just complained about it rather than take it back and raise hell.

The bug in the design as I see it is that a computer is present in the device. The fact that the code itself also has bugs seems irrelevant to me since the code should not be there in the first place.

Dennis Atkins
Monday, April 28, 2003

As Bored points out, the article actually calls on companies  to be made responsible for software; it doesn't castigate programmers. 

It also references calls for programmers to avoid irresponsible deadlines, as one contribution to avoiding bugs.

Also, if we think about this theme a bit, and following Denis Atkins' approach, we could ask why faulty dishwashers and Mars landers are immediately seen as programming failures. They are probably failures in sensors, or perhaps even assembly.

.
Monday, April 28, 2003

The story points out that The Mars Polar lander crashed due to a software bug.  And boy it was a bug.  According to the NASA report http://spaceflight.nasa.gov/spacenews/releases/2000/mpl/mpl_report_1.pdf through http://spaceflight.nasa.gov/spacenews/releases/2000/mpl/mpl_report_4.pdf  There were faulty Software requirements.  This made "Software Failure" the most likely point of failure  Quote Section 7.7.2 Header, Findings, Point 1:

Protection from transient signal behavior of the touchdown sensors was not specifically called out
in the requirements. The requirement that specified that “the use of the touchdown sensor data
shall not begin until 12 meters above the surface” was intended to eliminate any danger from
sensor failure modes, including transients. However, that requirement was not included in the SRS
in the requirements flowdown process, and it was not included in the requirements to be verified
during system testing. The protection from transient signal behavior was not adequately captured
in the system or subsystem requirement specifications, nor in the system-level test requirements.
Therefore system, subsystem, and test teams did not verify transient signal immunity during
software and system testing.

and section 7.7.2 Header Findings Point 9

LMA MSP engineers presented the software issue described above to the Review Teams meeting
at LMA in Denver. It was not detected in software walkthroughs or unit tests, nor was it found
during the cruise phase of the flight. The touchdown sensor problem was found during a test run
on the 2001 Lander when a test engineer pushed a button indicating a touchdown too early in the
test. He released the button when he realized his error and was surprised when thrust termination
occurred prematurely. That led to a failure analysis that uncovered the software problem.

End Quotes.

But this does point to a software failure... but was it the cause of the Lost???

A Software Build Guy
Thursday, May 01, 2003

*  Recent Topics

*  Fog Creek Home