Fog Creek Software
g
Discussion Board




Unexpected results in Open Source SE

Stephen R. Schach of Vanderbilt University presents the findings of a large scale study in what he calls "Three Unexpected Results in Open-Source Software Engineering"
-----
Unexpected result 1: Linus’s Law is not applicable to open-source software development
Open-source software development is undemocratic

Unexpected result 2: Linux is unmaintainable
Common coupling is the problem
Category-5 global variables are the main culprit

Unexpected result 3: The LST result is false
Most maintenance is corrective (fixing faults)
-----

http://www.vuse.vanderbilt.edu/~srs/three.unexpected.ppt

Just me (Sir to you)
Monday, May 10, 2004

(Warning - PowerPoint slide show)

Joel Spolsky
Fog Creek Software
Monday, May 10, 2004

Result two was unexpected (and interesting) - but I can't really say the other two were.

a cynic writes...
Monday, May 10, 2004

Regarding point #1:
1. Open source software is typically free for download and use
2. Much open source software is released more often than the competing closed source software
3. Therefore the code-test-release-reportBugs cycle is more frequent

So in my own experience the periphery may not fix as many bugs as some may have asserted.  However, they do typically enable the more frequent development cycle and the corresponding improvement in software quality.

Please note that this does not address issues like usability, only code correction... and only for popular open source software.

Scot
Monday, May 10, 2004

Do we have any link to the actual paper. The PowerPoint Presentation makes the authors appear lmoronic, so it would be nice to see the reasons behind the assertions.

Stephen Jones
Monday, May 10, 2004

I think it would be a collage of several papers http://www.vuse.vanderbilt.edu/~srs/files.for.homepage/references.html

Just me (Sir to you)
Monday, May 10, 2004

The guy appears an impressive researcher.  I fail to see the point of just publishing the powerpoint slides though.

Stephen Jones
Monday, May 10, 2004

My guess is this is a set of ppt's that accompany a talk that is given on a roadshow through a college landscape. It ties a few things the lab has done over the last few years together into a nice 60 minutes package. Typically a url is put up where the attendees can download the slides.

Just me (Sir to you)
Monday, May 10, 2004

I don't know why any of those three were unexpected.

I can't imagine a large development project being a democracy, every Open Source project I've seen has a Benevolent Dictator in charge. All large systems (like a powerful operating system) are difficult to maintain, and most of the cost of almost every project comes after initial development.   

Tom H
Monday, May 10, 2004

Tom,

if you go through the presentation, the info is a bit more specific than that.

Just me (Sir to you)
Monday, May 10, 2004

The problem is you don't know of other objections have been addressed. For example he states that Corrective Maintenance takes up 40%  of bug fixing time, and that the earlier figures of 17% were the result of survey respondents being optimistic or wanting their enterprise to appear in a good light. But he hasn't taken into consideration the possibilty that modern software produces many more bugs than 1978 software.

Stephen Jones
Monday, May 10, 2004

> But he hasn't taken into consideration the possibilty
> that modern software produces many more bugs than
> 1978 software.

Stephen raises a frightening issue. Yes, modern software is more complex and larger, but that would mean not only that modern development tools and methods have not kept pace with the rise in complexity and have in fact fallen behind what is required for improved reliability.

Are you telling us C++ is not a panacea? Are you saying it's worse for the process now than Fortran/Cobol were 25 or 30 years ago? Remember we didn't even have IDEs in 1978.

old_timer
Monday, May 10, 2004

Actually what I was thinking of was mainly that software now is released to so many more customers that there are going to be a lot more scenarios to deal with than there were then.

Stephen Jones
Monday, May 10, 2004

Democracy?  Whatever...

The more interesting part of the presentation has to do with global dependencies. 

The author counted 3 million LOC in the linux kernel.  Yet he can only find 99 instances of global variables?  Really?

Small tangent:
Of the 5 different types, I'm a bit confused by his use of "globals defined in nonkernel modules".  Which merely shows the crudeness often propagated by PPT type presentations.  There were likely other information he was trying to convey - because the only modules that can be linked into the kernel either staticly or dynamically are kernel modules.  Perhaps he meant that nonkernel modules are those not (typically?) statically linked into the kernel.

OK, but moving past that. in a project the size of the linux kernel, to have just 99 instances of globals is amazingly small.  That the number of uses grows exponentially is meaningless - especially in the context of his example: current.

"current" is the thread which is running.  It is used everywhere, but written only in sched.c.  Even a kernel "hack" like me knows that (one of those "eyes" in the ever lucid diagram of "core" vs. "minions").  Oh, but someone could write a module that breaks "current" so easily, and down comes the OS.  BFD.  Thus the benevolent dictatorship of Linus.

Back to the democracy thing I guess.  Others have openly complained of the Linus source control system (ie, submit patches into the black hole names Linus).  Its an ongoing thread of thought more interesting to those actually subscribed to the LKML (I am not) than requiring an NSF grant to write coarse PPT presentations.  Baloney.

Back to globals.  99 instances of globals, 23 of which don't even count! (His type 1: defined but not refernced).  This is a problem?

His type 4 and 5 globals are likely globals defined and used in things like ALSA (linux sound) or Linux video kernel addons.

I've written dozens of kernel modules (aka linux device drivers) for different hardware devices and have only accessed globals through macros (some of the page table mappings and access to things like network buffers go through what may be considered globals).

Interestingly, the author has become an "eye" in the land of the minions contributing input to the opensource landscape.

Finally, I have some really shitty source from one of our hardware vendors that thought it would be a good idea to write DirectShow code and then adapt the result to Linux.  Its a means of writing cross platform code (har-dee-har).  In one module (C file) alone, they have defined over 300 global variables with undefined scope.

99 (and really, take away 23 "type 1") globals in a code based of 3 million lines of code?  Its like pissing in the ocean.

Conclusion: unmaintainable.  My ass.

hoser
Monday, May 10, 2004

The LST thing is very interesting since I've seen it in SE textbooks and this shows that those numbers, and presumably ALL numbers in SE text books are just totally made up nonsense since they come from an era before *any* people kept any accurate records.

What I really want to say is that is a gorgeously presented powerpoint presentation that clearly makes its points, emphasises key points wonderfully and has delightful pacing and structure.

It is the first powerpoint show i have ever seen with all hese qualities.

Dennis Atkins
Monday, May 10, 2004

Hoser, the LOC figure he quoted for the Linux Kernel was 14,256. The three million figure was for Total LOC. As I always though Linux was a kernel this has me a little confused.

The LST figures were based on a survey, and where probably the most accurate figures available at the time. That they might as well have been made up is another matter. However, as far as I can see, the authors are comparing apples with oranges. The LST percentages are for time spent on maintenance tasks, while what they did was count the number of tasks to find what proportion that number was of the total number of tasks. If maintenance tasks take less time than other tasks the criticism doesn't hold. Also it is far from clear that the types of software in the two samples were the same.

It is a very clear PP demonstration as you said Dennis, but the question is whether that clarity reflects the facts, and unfortunately we do not have easy access to them.

Stephen Jones
Monday, May 10, 2004

"if you go through the presentation, the info is a bit more specific than that"

Do they compare Linux to, say Windows 2000 or an Apple OS? How many bugs are reported/fixed by non-core developers in other OS's compared to Linux? And I still don't understand the point of #1; someone makes decisions, well, okay...

They list some statistics about Linux (not Open Source in general, just Linux) and they make some mushy statements about them.  The presentation looks nice, but IMHO it doesn't say anything.

Tom H
Monday, May 10, 2004

One more reasone that PPT is a lousy medium for technical presentations.  It is believeable that the bare minimum kernel to boot is around 14K LOC.  And perhaps when the author indicates non-kernel modules, he really means optional dynamically linked kernel modules (like network drivers, audio drivers and plug-in subsystems, filesystems, etc.).

It would be useful to state such.

hoser
Monday, May 10, 2004

Yes, it seems he does play loose with hte facts. He includes in a chart but does not call attention to tha fact that the Linux kernel is a much smaller proportion of total LOC than BSD, which obviously has an effect on the ratio of globals coupled between in and  outside the kernel and otherwise.

I liked the presentation's structure and the device he used of placing the shocking facts alone on a single page with no commentary after having built up to them. I think he has a good understanding of classical rhetoric.

Dennis Atkins
Monday, May 10, 2004

Actually, the LST analysis is very useful.

Also, the usage trends of globals exponentially rising is useful as well.  I think the conclusions are way off (unmaintainable, whilst Linux remains nicely maintained); but does indicate, that with respect to its peers, Linux is relying on global dependencies too much.  At least with a broad brush, this may be true.

The real analysis would come in "how" those globals would be accessed and does it make sense to use them as they have been exposed.

hoser
Monday, May 10, 2004

Exactly! It's pretty clear that Linux is being maintained successfully, so the conclusion is obviously wrong.

Also there is nothing to stop them from refactoring if they want to reduce dependencies, it's not like the code is etched in stone.

Dennis Atkins
Monday, May 10, 2004

Old timer:

"Stephen raises a frightening issue. Yes, modern software is more complex and larger, but that would mean not only that modern development tools and methods have not kept pace with the rise in complexity and have in fact fallen behind what is required for improved reliability."

Or it could, perhaps more likely, mean that programmers nowadays, in general, are not as 'good' as programmers from the 70s.

MR
Tuesday, May 11, 2004

Actually, that is possible. If you had to send your code through a terminal once a day, or once a week, when you got our share of the mainframe, then you would probably try to make sure that it compiled first time instead of next week.

Stephen Jones
Tuesday, May 11, 2004

*  Recent Topics

*  Fog Creek Home