Fog Creek Software
Discussion Board




refactoring- debunking the myth?

I take issue with Joel's recent "bathtub" refactoring essay.  Although I often agree with Joel, this is one subject where I have a difference of opinion.

My experience is that refactoring in place of ground up redesign of applications leads to progressively more and more time refactoring, leaving less and less time for new features. 

For example, let's take Joel's description of classes that were not designed, but essentially created on the fly.  What happens when it becomes apparent that this code is duplicated in multiple areas?  In a future refactoring exercise, this code will likely need to be consolidated.  This pattern will continue to repeat in my experience, leaving less and less time for value-added work.  Ease of maintenance is only a benefit for a product where new functionality needs to be layered in.  If ongoing refactoring becomes an exercise unto itself, this will utilize resources that could be working on new functionality.

I do believe that a complete rewrite from scratch is not a good idea.  However, I also think its not a good idea to refactor periodically in the place of well thought out design.  In some place, components will need to be rewritten.  In some cases, the product or certain features may not be functional for a certain period.  I think that is a necessary part of the development process for most complex products. 

I'm definitely open to criticism on this topic, and I'd like to read some opinions to the contrary.

randy
Friday, January 25, 2002

This reminds of same the battle going on in XP. In my opinion, it's not that designing up front is wrong, the mistake is to make a design that is too complex.

XP says do the simplest thing that works first. However, what is the simplest? Building an application for a single user and refactoring it to become multi-user can be quite expensive. Same for security, adding security later on in a (distributed) system is almost impossible.

So it is inevitable that you do some analysis at first. However, the design may not take too much time to implement. In my experience, you need the feedback from the code. Then go back, i.e. short iterations.

XP gurus, you'll probably not agree, so let's hear it!

Stephan Westen
Friday, January 25, 2002

I have done a lot of contract work in the past 20 years.  This often involves taking some project that is “almost done” and being asked to just fix the final bugs in the next few weeks.  In truth, the code is a mess -- Routines that last for pages, lots of nearly identical code.  In these cases the “code scrubbing” described by Joel Spolsky is very productive.  Not only does the code get better, often weird problems disappear, and it’s really the only way to get to understand the code.

I agree that at this stage, no new features or logic changes should be made.  This makes the job pretty mindless, and lets you learn what you have before you really change it.

Often, after that first phase there is a need to take another look.  Westen writes about adding features, for example, changing from single to multi-user capability.  At this point, you can reconsider either replacing or ‘fixing’ certain portions of the code.  But you do so after gaining a lot of knowledge first.

Cleaning up code isn’t always fun, but if its someone elses code you get the ego boost of smugly making something nice out of some awful mess.  IF its your own code, you get that humbling realization  “what was I thinking?!!!!”

Joel Goldstick
Friday, January 25, 2002

There are two things:  Refactoring by itself, and refactoring within XP.

* For normal refactoring, it is pretty clear that we can agree that it is not always good.  Sometimes it's pointless.  If you're embarking on a pointless refactoring, the probability is that you're doing other pointless things, and refactoring is not the problem.

* For refactoring within XP, it must be kept in mind that refactoring exists because of the main problem of XP:  Quick designs often don't age well.  XP is a bunch of self-reinforcing practices, and some are there to compensate for others.

One main thing to keep in mind is that XP is not intended for large projects.  And for a distributed system that needs security later -- there is some knowledge (and test cases) gained by writing without security.  While it may sound unfun to shoehorn in security as a large global change, at the same time books like Mythical Man-Month advocate building 'prototypes,' which are long the same lines.

For enduring software, it can take a couple iterations.  Sometimes those iterations are difficult to achieve within certain business constraints.  But these things need to be kept in mind rather than hidden from.

And plus, I think that XP never says, "Don't think."  If you know that your design is nontrivial, then spend a nontrivial amount of time thinking of it.  Don't let a Methodology think for you.  Just don't overcomplicate things, or you may never ship.

Richard Denker
Friday, January 25, 2002

XP to me means the new Microsoft OS.  But here it seems you might mean something else.  Am I missing something?

Joel Goldstick
Friday, January 25, 2002

http://www.extremeprogramming.org/

tk
Friday, January 25, 2002

Martin Fowler's book "Refactoring" recommends as a means to add new features, not as useless code masturbation. You refactor the code when you would like to a new feature, but that feature does not exactly fit into the current design. So you put new development on hold momentarily. You refactor the current design so that you have a logical place to insert your new feature. Then you stop refactoring and begin coding your new feature. Refactoring is a benefit to new feature development, not a distraction.

cop
Friday, January 25, 2002

it's been my experience working with XP that you are supposed to find out what's REALLY important to the user and THEN make your decisions on what's the simplest course of design.  So in the event of the distributed system, it would become apparent fairly quickly that security was an issue and then you could prioritize your features to include it well before the distributed portion was fully functional.  This is much easier than getting the distributed portion fully working and then implementing the security system, hence it is the "simplest thing first" approach.

however, I have a question in regards to the "never start over" views espoused in the article and refactoring in general.  How do you handle applications written in old, out of date technology, or worse, the wrong technology?  Suppose you go to work at a company that has been trying to provide internet functionality in COBOL for some time and  is hitting roadblocks?  It's an extreme example, obviously but it's no worse than some of the other decisions I've seen.  Obviously, there are other technologies more suited to this kind of work, so it would seem smart to advocate a re-write.  Would you advocate a plug-in architecture instead, where the new language and applications tie into the old?  What about moving from outdated systems (OS/2 to Unix or Windows)?

Ed Goodwin
Friday, January 25, 2002

Ed Goodwin writes:

What about moving from outdated systems (OS/2 to Unix or Windows)?

That can be done with the scrubbing method Joel describes.

The first pass is to refactor out non-portable code into a set of well-defined non-portable code.
The second pass is to define portable interfaces to the nonportable code.
The third pass is to replace the nonportable code with portable code.

It can be "non-fun", as someone else put it, but it works (I've done it), and it avoids the "second system syndrome" that often plagues redesign/rewrites.

Tim Lesher
Friday, January 25, 2002

Redesign the whole system from the ground up, using UML,

But implement it using the current code base.

Adam Younga
Friday, January 25, 2002

I have had the "pleasure" of doing the same that Joel has done. I took a 65KLOC piece of C++/VB and via successive passes through the code reduced it to 24KLOC. There was still duplication and so I believe 18KLOC was achievable.

Here's the benefits:
1) no dead code or unused "features"
2) no strange hacks that was used to get the dead code and unused features to work.
3) no code with comments like: " // temporary until xxx happens "
4) weird one-off bugs dissappeared. (**)
5) compile times dropped
6) search times dropped
7) testing time dropped
8) adding features was a piece of cake in the code I'd cleaned up (in the undone code it was still iffy)


** This in fact was worrisome. I often wonder if the bugs have really dissappeared. But they didn't show in Unit testing, system testing or QA. Thus it wasn't THAT worrisome.

John

JohnA
Friday, January 25, 2002

It's interesting ho strong the sentiment is to try and debunk Joel on this. Reinfroces to me that Joel is probably right.

pb
Friday, January 25, 2002

As with many other things, there are shades of reality between "always rewrite from scratch" and "never rewrite from scratch". I worked on one particular project over 3 releases (the 3rd through 5th) that had around 100K lines of source code. In the 4th and 5th releases I was able to do many little refactorings and take advantage of some new language features to fix bugs, clarify the program flow, and deal with new requirements. This was good.

Ultimately, though, major flaws in the original architecture became very apparent. Adding a new feature could touch code in dozens of places, and the towers of delegating classes had gotten way too tall. It was clearly time to do a rip-and-replace on a major chunk of the code, around 1/3 of the whole.

Of course, I wasn't able to convince the client of this. The minor tweaking strategy had been so successful for three releases that they weren't prepared to commit the money for a major rewrite. So, I declined to contract for another release, and sure enough, the 6th release came out with pretty much all the same bugs as the 5th one.

Can't tell you who the company was, thanks to NDAs, but it starts with M.

Mike Gunderloy
Saturday, January 26, 2002

The message that started this thread talks about two options:

1) create a detailed design in advance of coding, then go code it

2) evolve a design, from a simple starting point, with refactoring and related practces which reinforce it.

There are endless discussions about the merits of these options, and experience has driven my thinking from #1 towards #2 over time.

But that's completely irrelevant to Joel's situation or what he wrote about!  He had an old and successful code base, he was *not* at the beginning of a project... he was (is) years in to it.  His set of choices was this:

(1) ditch the code and rewrite, perhaps with extensive up front design.  We all know how Joel feels about this option.  It would have take far more than 3 weeks, by the way.

(2) leave the code alone, making small changes to add in additional features.  Apparently this was getting painful.

(3) refactor it to make it better, to improve the design of the existing code.

Notice that all of these choices are things that can be done in the present time, deep in to ongoing product development.  He chose #3, which is the same choice I most often recommend and make.  Regarding option 4:

(4) reach in to the past with a time machine, and tell yourself to have designed the product much better at the beginning, to accomodate all of the ideas that came up since then that you didn't even know about at the time.


#4 may or may not have been ideal, but it's definately fictional.

Kyle Cordes
Saturday, January 26, 2002

Kyle,

There is another option.  Since refactoring is a bottom-up method, one can architect a system top-down.  In the spirit of "respecting the old code," this can be done with high amounts of code reuse.

Fortunately, this is not an all-or-nothing task.  You can refactor the old code to make code reuse simpler for your new design.  Code is fluid, and you have lots of choices.

BTW, to pb:  I notice only one person who dissented with Joel.  Maybe it would be healthy if there were more. :-)

Robert Quentin
Saturday, January 26, 2002

> For example, let's take Joel's description of classes that 
> were not designed, but essentially created on the fly.
> What happens when it becomes apparent that this code
> is duplicated in multiple areas?
Well, Joel says that one of the things he was doing was removing duplicated code, so that shouldn't have come up too often.

> Ease of maintenance is only a benefit for a product where
> new functionality needs to be layered in.
Well, Joel talks about adding new features in later, so it appears that ease of maintenance would be a benefit here.

> I do believe that a complete rewrite from scratch is not a
> good idea. However, I also think its not a good idea to
> refactor periodically in the place of well thought out
> design. In some place, components will need to be
> rewritten.
Betcha thought I was going to start with 'Well, Joel says...' again, didn't ya?

Here's my take on the Big Design Upfront vs XP design as-you-go battle. I think the XP thing is really more about not doing design upfront using speculation in place of actual information.  Don't fool yourself into thinking that you have an estimate just because you got some developer to say that (s)he *thinks* it should take two days to build a feature that hasn't been defined yet. Don't fool yourself into thinking you have a plan just because you *think* you know what future requirements will be. Wait until you have facts, and have the confidence in yourself to not make your decisions until then. Once you actually know, then feel free to use sound design principles, patterns, UML, whatever you like to plan out the work.

I think Martin Fowler has some interesting things to say on the balance between upfront design and as-you go design, especially given his UML background in the beginning of Extreme Programming Examined (ISBN: 0201710404)

Pretty much my only issue with what Joel said was his claim
  "But I was never doing the types of things that cause bugs."
I've said that to myself many times, and I was usually wrong.

Roman Zabicki
Saturday, January 26, 2002

(1) I agree that the most useful form of refactoring is the kind that "makes space" for a specific new feature.  There are a couple of other good reasons - e.g. improved performance - for refactoring, but most refactoring is just masturbation.  By definition refactoring is moving sideways, and moving sideways is sometimes necessary before you can move forward, but there's something wrong with people who move sideways all the time.  Insert various puns about crabs here.  Either such people are no longer interested in moving the project forward and should be moved to a new project, or they lack the skill/vision to move anything forward and should find another profession.

(2) There's an optimal rate at which refactoring should occur for any given project.  It's probably not "never" and it's probably not "all the time" either.  For most projects it's somewhere in between.  The trick, as always, is to learn how to make *intelligent* situation-appropriate decisions about when to refactor and when to leave well enough alone, instead of buying into some inflexible dogma that always pushes you in one direction.

Jeff Darcy
Sunday, January 27, 2002


I agree that refactoring is rarely needed. But, usually, when more than one developer seems to agree that a code rewrite is necessary, a "refactoring cycle" must be scheduled.

(loved the idea about crabs...)

Leonardo Herrera
Monday, January 28, 2002

I think it's important to remember that refactoring != rewriting.  Rewriting significant pieces of code is a drastic action that should only be taken under extreme circumstances, for all sorts of reasons that Joel has written about.  Refactoring involves rearranging or restructuring code without actually replacing it, especially via "zero-impact" transformations that provably have no effect on functionality.

Refactoring therefore does not carry the risk of losing "hidden knowledge" embedded in the old code (as is the case with rewriting) but IMO it's still mostly a waste of time unless you have some specific need to rearrange the code.  If you can't explain cogently *why* you're refactoring and what precise goal you hope to achieve, there's a high probability that you'll have to refactor it *again* if/when you do need to add an actual new feature or discover an actual maintainability problem.  As I said before, some refactoring is always necessary, but constant refactoring is what code-diddlers do instead of productive work.

Jeff Darcy
Monday, January 28, 2002

The purpose of refractoring should not be to clean up existing code for the sake of cleaning it up.  It should center around the need to be able to add/update features to the code base in a timely manner.  The reason Joel refractored FogBUGZ was not just because the code was a mess but because it was beginning to become more difficult to implement new features in a timely manner and easily support those new features.

In one project I inherited, the code base was a jumbled mess that was thrown together. The next version called for a reimplemenation of a feature that was removed because it did not behave in a predictable manner.  The feature was to be a core feature of the application.  Every attempt I made to implement the feature in the current code base failed because on the mess that existed on top of it.  So I created a new project and implemented the feature completely separate from the current code base.  Then I ported each function from the current code base into the new application until I have reproduced the applications functionality.  Along the way, I was able to introduce new functions that allowed for greater flexibility and expandability when the I start the next version. I was also able to learn the entire application inside and out.

Given the choice, I would have liked to not have had to refractor the entire application.  But it seemed to be the only choice given the new set of feature requirements that I was supposed to implement.

Jacob
Tuesday, January 29, 2002

I've disagreed with Joel's blanket statement that rewriting is always wrong since I first saw it. While I don't disagree with all of his reasoning, I disagree with the conclusion. Here are some of the mitigating factors in my mind:

1. Joel's canonical example of Netscape's disastrous rewrite effort misses one really important point: Netscape's first mistake was not in rewriting, it was in failing to have a shipping product to fund the rewriting! That was just dumb business, but the technical decision that a rewrite was needed may have been sound (although it may have been executed poorly...).

2. Refactoring an application that is 30K to 60K (something one expert with the app could do in about 3 weeks) does not apply to a great many software applications -- or more generally, codebases -- that are out there and have hundreds of K, or even millions. Refactoring only works if the code is understandable so you can accurately assess the effects of what you are doing while you refactor, and if you have unit tests to check the correctness. Again, this does not apply (sadly) to many codebases that are around.

3. Every application is built with a core set of fundamental assumptions about it -- central or distributed, lightweight or for big iron, a handful of transactions or millions per day, etc. The consequences of these fundamental assumptions filter throughout the codebase and influence many subsequent design decisions. So refactoring a calling sequence or moving a member function from one class to another is not really comparable to changing the fundamental assumptions of the application or codebase. While you could probably mathematically prove that any redesign could be achieved through a series of such simple refactorings, that is not necessarily the most effective/efficient way to do it.

4. To me, code itself has no particular value. The knowledge of what the application needs to do that is embedded in the code is what is valuable. That knowledge should be mined during rewrite efforts in order to reduce the re-implementation time. A rewrite that takes as long or longer than the original writing is probably reaching too far (see Brooks' Second System effect) or being poorly executed.

I think there are any number of valid ways to improve a product or a codebase, including both refactoring and rewriting (or partial rewriting). But if the activity, whatever it is, is not planned, managed, or executed well it can fail.

Jeff Kotula
Tuesday, January 29, 2002

I think it would help if Joel was honest to himself and admitted that refactoring code "in-place" was actually re-writing.

James Ladd
Wednesday, January 30, 2002

"I think it would help if Joel was honest to himself and admitted that refactoring code "in-place" was actually re-writing. "

If so, it's re-writing a piece at a time.

The only re-write experience I've been unfortunately involved with was one of those noble "re-write from the ground up." efforts.  In this situation, a planned new feature was going into the rewrite.  When the schedule slipped (of course) we ended up losing some great deals because the new feature was in a non-shippable code base.

Remember one of the requirements laid out in favor of refactoring over a rewrite.  You always have a shippable product.  Joel said he could have shipped his code at any point in the refactor.

Bob

Bob Crosley
Wednesday, February 06, 2002

*  Recent Topics

*  Fog Creek Home