Fog Creek Software
Discussion Board

Joel's wrong on this one.

The failure of the new CLCS project doesn't invalidate the Shuttle's SEI Level 5 'Poster Child' team.  It validates it.

The software produced and maintained by the SEI Level 5 is the archaic code they were hoping to replace.

This is actually a big win for SEI Level 5, because using the latest techniques and technologies, they were unable to better the 30 year old system.

I'll bet their mistake was to rewrite from scratch.  As the fast company article states, the old system had 30 years worth of bug fixes stored away.

The articles dissmissal of the old code base as 'archaic' seems to indicate that the new developers lacked respect for the existing code.

Still, Joel can take refuge in the fact that there is egg on Sun's face:

Ged Byrne
Wednesday, September 18, 2002

"The challenge will be to replace an existing system for which requirements are well-defined. CLCS development must separate real requirements from 20 years of cultural influences and obsolete hardware."

This link seems to confirm they made the classic rewrite from scratch mistakes. 

It looks like the Architecture Astronaughts were at work, too.  Were embedded web servers and Java applets really the right tools for the job?

Ged Byrne
Wednesday, September 18, 2002

It is always surprising to me how many large organizations still rely on applications that were coded when Reagan was still in his first term (or earlier). If the company was big enough to buy a mainframe back then and build, say their billing application, they could easily be using that same application today (hey, that COBOL isn't pretty but it's 100% debugged).

I've met a number of people who sheepishly admit they can't justify the cost of "upgrading." What's the alternative? Port it all to SAP? Well, lot's of people have tried that and I've never met anyone who found that inexpensive or painless.

There are lots who have found the better alternative is emulation and/or middleware (and by middleware, a lot of  people mean that perl script they hacked out last night to grab stuff from the mainframe every morning at 3am, munge it, dump it into Oracle and then feed it out to a Web Service or some PowerBuilder app).

But what's a CTO to do? What sounds better when they chat with the CEO:

CTO Alpha: "We are investigating the latest and greatest technology from [insert big name vendors] that will support [insert buzzword] at a cost of [insert exorbitant cost] and I've hired the smart people to do this"


CTO Beta: "You remember Bob Cratchett? Well last week he and Kermit D. Frog spent a few days writing some web scripts so customer service agents can get billing history of all our clients through their browser. We should probably add a few features, but guess what--it didn't cost us anything other than their salary for two days"

I suspect that to a lot of CEO's, CTO Alpha sounds like a smart, sophisticated technologist who will lead the company forward through the twisted technology maze--but CTO Beta sounds like a crank who spends his spare time at flea markets and garage sales. And yet, I think CTO Beta is onto something.

Karl Fast
Wednesday, September 18, 2002

I'm not going to get into CMM or anything like that. 

The majority components of these systems are HARD real-time systems.  That didn't seem to crop up in the article, although it mentioned that pieces of the system were on "standard hardware". 

I don't think it's realistic to suggest simply writing an emulator for the existing systems to run on PC's--unless you have some scheme for making sure the real-time requirements are met. 

What happens when the emulator gets ported from a real-time OS to Windows XP, and people start wondering why the system just became unpredictable?

Wednesday, September 18, 2002

I'm with CTO Beta.  I spent the later part of my career trying to keep bad projects from CTO Alpha from being done nd when a bad project got started, enlisting the cranks to bail them out.

Wednesday, September 18, 2002

What I think is really required is a decent method for taking an old code base and improving it.

I think Martin Fowler's refactoring is a step in the right direction.  It provides mechanical methods for tackling the complexity in existing code.

With regards to the NASA code.  If the FastCompany article is right regarding the quality of the requirements document, then redeveloping the system onto any new platform should be a piece of cake.

They have a database full with test cases, there really should have be no problem at all.

However, it would seem that they didn't want to do it the easy way.  Instead they wanted to be 'cutting edge.'

Ged Byrne
Wednesday, September 18, 2002

These guys aren't the guys in the FastCompany article.  CLCS is ground control software/hardware.  The FastCompany article was about the Shuttle Flight Software group.

However, there are undoubtably volumes upon volumes of requirements for the current ground system.  They may not transferrable or searchable or even electronic, but I'm sure they exist.

Wednesday, September 18, 2002

This isn't a programming problem, or a technology problem, or even a project management problem. It's a political problem.

Among a number of books on this subject, you might want to read Steven Kelman's very short (183pp) "Procurement and Public Management: The Fear of Discretion and the Quality of Government Performance." It's all about Federal procurement of computer systems. From the back cover:

"Requirements intended to promote competition in contracting have made the performance of government worse, not better, according to Professor Kelman. His studies of federal procurement of computer systems show how practices designed to prevent collusion discourage managers from withholding contracts because of poor performance. Officials cannot develop the high-tech applications necessary for the best use of tax dollars, he says, because they cannot maintain long-term informal relationships with the best contractors. Case studies document the author's arguments."

Case study 7 is "FAA Host Computer Replacement."

In other words, if a software shop you hired did a spectacular job, you can't take that into account in evaluating their bid on the next project.

BTW, I love the title of chapter 3: "The Tyranny of the Proposal."

Ethan McKinney
Wednesday, September 18, 2002

Keep in mind, they're not simply trying to replace the exact functions of the existing systems. They're trying to build all-singing, all-dancing integrated systems with constant automatic data integration nationwide. They want the system to warn controllers of possible problems (collisions, traffic jams, etc.) and display things like how backed up a given airport is.

Now, the idea of integration isn't necessarily unreasonable. After all, they're still passing all sorts of data around  by human power: voice, typing, reading the typed messages and inputing the information into your terminal, etc., etc. This sort of archaic human-in-the-loop system obviously introduces all sorts of bottlenecks and chances for error. But it sure makes designing and building the system intelligently challenging!

Ethan McKinney
Wednesday, September 18, 2002

The original software (that was less than 1 MB) had been adequate for two decades. Now that the replacement project has failed, the original software must be continue to be adequate. Instead of a whole-hog rewrite, why couldn't they port that tiny old program and then maybe extend it to add the "necessary" new features (that they are apparently doing without at the moment while they are still using the old software)..?

Zwarm Monkey
Wednesday, September 18, 2002

Why do so many people get such a kick out of writing "Joel's wrong"? It's like they sit there waiting for a new article to ponce on.

Wednesday, September 18, 2002

"Instead of a whole-hog rewrite, why couldn't they port that tiny old program"

Try getting appropriations out of Congress for _that_ one.

"Yes, we'd like to just re-write the 30-year old software and stick it on a cheap PC."

"What? Why aren't you modernizing the system? Why are we throwing away money rebuilding what we already have? Where are the cool demo screenshots? Why aren't you starting over from scratch and building something whiz-bang amazing?"

"Because we're f***ing morons who can't run a project that complex."

It's all about the Benjamins.

Ethan McKinney
Wednesday, September 18, 2002

Ethan, you bring back painful memories with each post, but don't stop.

Reminds me of "the Woody Allen gag about trying to find a framework to turn a concept into an idea. "

And I think the purpose of this forum is the re-examination of Joel's and our ideas.  If Joel is wrong he'd like to be the first to know it.

Wednesday, September 18, 2002

Ged: "The failure of the new CLCS project doesn't invalidate the Shuttle's SEI Level 5 'Poster Child' team. It validates it. The software produced and maintained by the SEI Level 5 is the archaic code they were hoping to replace."

You can't possibly be serious! SEI didn't exist when the original system was created in the 1970s, much less the pinnacle of methodology, the vaunted SEI Level 5. So your statement is historically impossible.

I don't know if the CLCS was supposed to be operating at level 5 or not, but I thought that all NASA projects were big on process, so this certainly is not an example of Big Process succeeding.

X. J. Scott
Wednesday, September 18, 2002

"You can't possibly be serious! SEI didn't exist when the original system was created in the 1970s, much less the pinnacle of methodology, the vaunted SEI Level 5. So your statement is historically impossible."

I don't see why. I was under the impression that the SEI models enable you to assess how effective your software development process is. Presumably a development group working on software that was started in the 1970s can still have SEI level 5?

It looks like there's some confusion here about what "SEI level 5" actually is. Personally I'm confused -- to me, the SEI is an institute, and their CMM is a way of assessing a project's/company's software development process according to certain criteria. I have a feeling if you asked people "what does SEI level 5 mean" you'd get a whole bunch of different answers.

Adrian Gilby
Wednesday, September 18, 2002

OK, I'll clarify -- "*The* SEI didn't event exist. The institute itself handn't been founded when the systems were designed and built. So saying that their maturity model was used then doesn't make sense unless they've got a time machine' which I'm not ruling out, but I think is unlikely.

X. J. Scott
Wednesday, September 18, 2002

Arg! Retarded typos I make when aroused! I'll just rewrite.

OK, I'll clarify -- *The* SE Institute itself hadn't even been founded in the early 1970s when the old Launch Processing Systems (LPS) were designed and built. So, saying that the SEI maturity model shoud be given credit for the success of LPS doesn't make sense unless NASA's got a time machine. I'm not ruling that out, but I think is extremely unlikely.

As far as definitions, I'm just going with the ones from their free longwinded pdfs that describe the whole capability maturity model on their website, and from the summaries of such in Yourdon where I first heard about this stuff years ago. I assume from this we both know where Level 5 is -- it's the metaprocess level where the process *itself* is measured, fed back into itself and improved.

X. J. Scott
Wednesday, September 18, 2002

According to a most informative post by cheeto in the thread entitled "After $273 million, NASA scraps computer project", the level 5 software was developed by the Shuttle Flight Software group, and has nothing to do with the ground software, new or old.

I'm impressed by the amount of money spent. It makes a similar debacle in New Zealand, which cost around $NZ 100 million, or around $50 million US, look fairly modest. Mind you, I'd happily undertake to deliver no software for considerably less than that. Any takers?

Andrew Simmons
Thursday, September 19, 2002

Let me repeat my first post in this thread:

>*I don't know* if the CLCS was supposed to be operating *at level 5* or not, *but* I thought that all NASA projects were *big on process*, so this certainly is not an example of Big Process succeeding.

Until some other evidence shows up I'll stand by my assumption that Big Process of one sort or another was engaged on CLCS.

X. J. Scott
Thursday, September 19, 2002

Personally, I don't wait to post Joel's wrong, it's just that in this case he was.

And, it would appear, so was I.  The front page credit belongs to Cheeto.  Thanks cheeto.

Still, I think my other comments are still correct ( albeit reiterations of Joel's principles).

Nasa decided to throw away 30 years of development in order to build an all singing all dancing, java using, web enabled whizz bang system from scratch.  This is why they wasted so much money.  The size of the process doesn't alter the size of the error.

Ged Byrne
Thursday, September 19, 2002

I'm not familiar with the space shuttle software, but as Dave pointed out in a previous post which everyone ignored, these are proper real time systems. The software described in is a fly-by-wire system. No Windows, no DOS, no VB, no objects, probably no O/S, no complex run time libraries, no pentium CPUs, no sub-micron silicon, etc. These systems must ensure sub-mS response time, immunity to high energy radiation, extreme temperature ranges etc. They use proprietary mil-spec CPUs. For all I know they may be programmed in assembler and use discrete logic (hardened transistors). So I think the normal commercial s/w development techniques are irrelevant here.

There, I feel better now.

Bill Rayer
Friday, September 20, 2002

I think much has been said (above) and is true to the bone. In addition, and IMHO, the people at NASA are (or sadly enough - "were") trying to do all of the following at the same time.

1. Create the revolutionaly new system, add the new, "necessary" features (which for the most part have grown out of the last 20 years of human-in-the-loop complexeties)
2. Use of technology which was not intended to be used in that context and probably has not been tested as much as some other technologies.
3. Project managers probably took a bite bigger than they could chew for each release cycle. Add to that the "re-write the code from scratch" mentality.
4. Being "true soldiers", avoid politics :-) Not enough use of a "facade" ;-) when dealing with the "politicians" (in the literal sense of the term).
5. Last but not the least (ha!) create the functionality that the old system already has.

I personally consider this a COLOSSAL, thought not unheard of,  failure of project management.

Sunday, September 22, 2002

I agree with Bill Rayer. The code is for Real-time Systems and not a flashy software to attract users.

I dont know the technical details; but if they were thinking to re-write the software in Java...I have nothing more to say.


(BTW, I am 23 yrs old and have zero experience in industry ;))

Monday, September 23, 2002

Well, only 4 or 5 years after the project started there's a final spec for Real-Time Java.

Art Vandelay
Monday, September 23, 2002

I like the emulate and update approach to this problem. NASA has no good reason to toss 30 years of tested/proven code down the garbage. If the code base was less than a megabyte, why did'nt they invest the time and the effort studying what that code did? If they can send people into space, they can translate "archaic" machine code!

The people against emulation state that it is difficult to emulate a hard real-time system. If the target system is 1000x faster than the legacy system, the time constraints imposed by the hard real-time system are not an issue.

The fact that NASA or the FAA cannot update legacy systems because of incompetence does not worry me. What worries me is failure is the expected norm and not the exception.

Mark Brown
Monday, September 30, 2002

*  Recent Topics

*  Fog Creek Home