Article about Lines of Code
I know, slashdot it's not the place to found good articles related to anything close to management, but today they posted this article:
It's about programmers productivity, specially about that lines of code thing. Nice article, but it feels incomplete and kind of naïve, IMHO.
Monday, March 18, 2002
This, of course, is the rationale behind Function Points and similar measures: the idea that you can quantify the amount of functionality delivered, rather than just lines of code produced.
The lines-of-code measure fits well with a short-sighted definition of productivity from manufacturing: number of widgets produced per unit of time. Well, on an assembly line, a widget built is a requirement satisfied. But as the article points out, a line of code written is not necessarily a requirement satisfied. Sometimes, a requirement can be satisfied by throwing out some lines of code and taking another tack. You end with a net negative lines of code, yet with a requirement satisfied. The manufacturing equivalent would be measuring, not widgets produced, but widgets passing QA (and in practice, this is a more common measure). And as someone said just recently (was it in one of Joel's articles? I cannot recall), a real measure of quality is requirements/expectations satisfied.
Another reason why lines-of-code is an attractive measure is that it tends to be a fairly consistent measure (if not a correct nor a useful measure). This gets into basics of human comprehension, and how many ideas we can keep in our heads at one time. Now part of what attracts many of us to programming is that we're better than average at keeping things straight in our heads; but regardless, we have limits (on average) not too far beyond those of mere mortals. And regardless of the language, a line of code represents one "thing" (assuming you have proficiency with the language), one concept to keep straight in your head. So teams of similar intelligence and skill and experience and discipline tend to produce lines of code at a similar rate. But there are a LOT of qualifiers in that. Differences in experience and discipline may lead you to produce fewer lines of code, but more correct/maintainable ones, and more requirements satisfied in the process. Differences in skill and intelligence may lead you to accomplish the same number of requirements with far fewer lines of code. The net result, I believe, is that lines of code are a misleading predictor/measure of productivity.
Now I like Function Points as a concept, but I have little experience with them as a measure of productivity. (I have used them with some success in estimating, not measurement.) The idea is simple, though, and ties into the purpose of this article: productivity measured in terms of requirements satisfied per unit of time. In Garmus's Function Point book, he spends a lot of ink on measuring existing functionality, changing functionality, and new functionality, all with an eye to measuring how much you need to change in the code.
The problem is that two requirements may be very different in size/complexity; so simply measuring items checked off a TODO list may tell you something about progress, but it's not very predictable or quantifiable. Function Points are an effort to size and quantify requirements and functionality. I'm not yet persuaded that they form a correct, perfect answer to that problem, but they're an interesting start.
Martin L. Shoemaker
Monday, March 18, 2002
I think it was Dijkstra (please correct me if I deny someone else of their rightful credit) who said that Lines of Code should be counted as a Liability, not as an Asset; That is, each line of code is money spent getting the process to completion. Think about it:
If you can buy something for 100 bucks, there's no reason yuo should pay 1000 bucks for it - unless, of course, the 100 bucks product requires lots of additional work and expenses, and the 1000 bucks is a a turnkey solution - but this is rarely the case. Equivalently --
If you can do something in 100 lines of code, there's no reason you should do it in 1000 lines of code - unless, of course, those 100 lines are fragile and unmaintainable, and the 1000 lines are trivial - but this is rarely the case.
Every single line of code costs money - in review, while debugging, while explaining, while porting, etc. It's a little hard to quantify, but nevertheless every LOC costs real money.
Measuring productivity by LOC is much like measuring success of a marketing campaign by money SPENT. While in the internet economy the latter was sometimes accepted (because money was too cheap), in general you want to minimize expenses (and thus, lines of code) to produce the same result. In both cases, a given challenge usually has an accepted minimum below which you wouldn't even try to go, and you usually don't care to be _exactly_ at the minumum as long as you're relatively close.
A small comment regarding the comment (I think originally JWZ's but also used by Joel numerous times) that "free software is free only if you don't value your time" - that is true. But there's a limit: A company I worked for installed Microsoft Exchange server a few months before I started working there. Original cost? ~$40K; Cost of a similar qmail setup? ~$2K in labour (definitely non-free, but possibly refunded in reduced hardware requirements for qmail; my estimate, based on the fact that the sysadmin in charge wasn't a Unix guy - if I set-up qmail myself, it would probably be down to $200 - I've done it before numerous times). Was the $38K difference worth it? My opinion would be "no", but others would disagree. Was the $40K a one time expense? Definitely not. Money, administration time for upgrades and service etc (all NOT improving the service -- just maintaining it) kept flowing. And now back to the subject:
qmail is ridiculously short for what it delivers, but its source is also not too readable for mere mortals and average programmers. Although it's probably not the "minimal" solution (in the Kolmogorov sense), I'll take it to be the shortest possible one in LOC for the sake of argument - I can't count at the moment, but the whole distribution is ~120K, and it includes manuals and setup code not requires for actual execution:
If qmail's author, Dan Bernstein, had taken the time to write code that's more readable but less concise, he would have probably been able to do that in 20-30% more LOCs. Most probably, it's between a 100 to 1000 times short than Exchange sources. Exchange does more - perhaps even a lot more, but little more that is useful. Now, I ask - who's more productive - the Exchange Team, or Dan Bernstein?
In just about every respect, I vote for Dan any day, even though he's probably miles behind the Exchange Team in LOC/day. For example, since qmail-1.03 came out, I couldn't find a single bug report filed anywhere. Surely there are bugs, but it seems non of them matter enough for anyone to report - can you say the same about Exchange? [qmail probably delivers more mail than Exchange overall, btw - it's used by Yahoo, Hotmail and just about any other massive webmail/list management service].
Qmail's concisesness makes an effective security audit possible within a matter of days - you can read AND GROK all of its source; No dependencies on outside components, no huge structures - just plain code. That's probably one reason why it's so robust.
To sum up and to clear up, this is not criticism of the Exchange Team - they are producing a product that is easy to market and that produces significant income and that is, in the end, almost all that matter. I just used it as an example to contrast with qmail, which is phenomenal in it's featurelist/LOC ratio.
Tuesday, March 19, 2002
The Dijkstra's statement is from his EWD1036, "On the cruelty of really teaching computing science", published also in the CACM 32(12), Dec 1989, with many comments.
Tuesday, March 19, 2002
I hate code. It's all bug habitat. I try to write as little of it as possible. This post is onomatopoeic.
Thursday, March 21, 2002
Fog Creek Home