Fog Creek Software
Discussion Board




never never ever do a code rewrite?

i just joined a startup company a month ago where i'm in charge of a java web application that was written by several part time programmers over a span of a year.  the code is a case study on anti-patterns --  EVERY method is a static method (i'm not kidding), no separation of presentation tier and business tier (lots of java code in JSP's, lots of embedded HTML in servlets) -- no signs of thoughtful design whatsoever.  oh yeah, and it works *sometimes*. =)

of course i can suck it up, work within the existing framework and accept the fact that it's going to take the team 10 times longer then it normally does to code something.  however, there will come a point in which the code will be too unmaintainable and too inflexible to adapt to change.

my general question is this:  do you believe that there is any circumstance in which a code rewrite is wise?

al
Friday, April 23, 2004

I'm assuming the code works, right?

In that case, you really need to preserve the working code in order to preserve all the knowledge it contains. There are probably things in there that are very useful that you would have to learn the hard way if you started from scratch.

What I would do in this situation is budget a certain amount of time for refactoring, during which you carefully clean up and rearrange the code while keeping it working at all times. You'll save 90% effort if you apply small, simple transformations that improve the code one step at a time, and you'll learn more about the domain as you do it. By the time you're done you'll have a nice clean body of code that works great and feels great and you'll understand it better, and it will only take a fraction of the time it would take to start from scratch.

Joel Spolsky
Fog Creek Software
Friday, April 23, 2004

It seems to me that refactoring, while generally a brillaint approach, can't really do much about a truly piss poor architecture.

If the program is poorly architected (with spaghetti code and no modularization) then you can never find a "chunk" to refactor.  You MIGHT find a thread to refactor, but ever time you refactor another thread, it doesn't get any easier until you're down to the last thread of code weaving through 50k lines of ...mess.

Mr. Analogy
Friday, April 23, 2004

I've had to refactor some doozies, and I've found that no matter how bad the existing architecture is, I can set up some kind of iron curtain of an interface and rework the internals behind that.

Devil's Advocate
Friday, April 23, 2004

So Joel, what do you do if the code doesn't work?

Tito
Friday, April 23, 2004

I've know quite a few developers over the years who say "this architecture sucks!".. mostly from the 'not invented here' attitude.

I think it is tempting to look at a big lump of code which has some flaws and is undocumented ( as I do with various legacy products in the company ) and say "let's re-engineer!", but the reality is that I would prefer to maintain that code in its current working state, and spend my coding time on new projects.

Andrew75
Friday, April 23, 2004

"This architecture sucks!"
"Does it work?"
"Ummm... yes."
"Can't suck that bad, then, can it?"

The absolute worst ball of mud to rewrite is a codebase that has been in production for years - that's what you have to refactor to find jewels like

//Have to increase the FactorCount by 1 because the HR
// system uses base 1 in one module - not incrementing
// it creates an incredibly hard-to-find bug (I just wasted
// two weeks on  this)
iFactorCount++;

In a "let's rewrite it from the ground up!" this would *never* be found and you've possibly just bought another two weeks of bugfixing.

Philo

Philo
Friday, April 23, 2004

======================================
//Have to increase the FactorCount by 1 because the HR
// system uses base 1 in one module - not incrementing
// it creates an incredibly hard-to-find bug (I just wasted
// two weeks on  this)
iFactorCount++;

In a "let's rewrite it from the ground up!" this would *never* be found and you've possibly just bought another two weeks of bugfixing.
======================================

One could also argue that in a ground up rewrite there would be no variable iFactorCount to deal with anymore.

A little out of touch, a little insane.
Friday, April 23, 2004

I think there has got to be something between refactoring and rewriting. 

I've read the refactoring book, and used the techniques successfully to take OK code and make it better using small incremental changes.  It was fine because that's all "OK" code needs.  It doesn't need a major rehashing.

I've seen others rewrite a slimy hairball of code from scratch and encounter many of the downsides Joel discusses.

What I've done when confronted with a slimy hairball of code (and I've always referred to it as a rewrite since I don't have any other term for it) is to create a new project with a better structure of objects and methods and then bring over the old code into the new structure.  Sometimes a block of code can be cut-and-pasted into one of my new methods verbatim.  Other times its so against the grain of what I'm doing that I just use it as a guide as I write the desired functionality afresh.  Other times I take it and refactor it before putting it in its new home. 

Is there a word for this?

It seems to me that Joel's "never rewrite" rule is too strict.  I agree, never rewrite a whole codebase, but I think the considered discretion of a seasoned and intelligent engineer is the best approach

Ken Klose
Friday, April 23, 2004

I think you're on to something there Ken.  I've dealt with some nasty code before, and I follow more or less the following process.

1) move the monster into a code-ghetto (http://fishbowl.pastiche.org/2003/05/28/the_ghetto_minipattern)

2) Create a better structured API to achieve the same task

3) Pull the code into the new structure one piece at a time.

4) Regression test

Basically it's a fairly large scale refactoring,  and it may break at times.  But it's still gradual enough to avoid having an important piece of knowledge leave.

Koz
Saturday, April 24, 2004

When I've had some bad code, I'm often surprised at actually how much refactoring can be done, even on the worst code.  Even if you can't be successful, attempting to refactor can reveal a lot of insight into the issues.

I did violate Joel's rule once and get away with it.  I took a gnarly piece of open source code with its own portability layer and spent a few weeks refactoring it.  I also discovered a spec on the Internet for the main algorithm of the code.  Taking my knowledge gained during the refactoring and using the spec, I started from scratch with an open source portability framework and reimplemented the code.  It worked and I realized that it was the right idea: refactoring the rest of the way would have taken a lot longer.  Being written in plain C and with a custom portability layer made a fairly simple algorithm into a huge amount of code: most of which was replaced by a standard C++ library.  I still agree with Joel's rule in nearly all cases, despite my exception.

Daniel Howard
Saturday, April 24, 2004

I think you got away with that because you had a clear specification to work against. Usually, the specification *is* that old body of code that you want to throw away.

Frederik Slijkerman
Saturday, April 24, 2004

I have had great success recently refactoring a big ball of hair into a squeaky clean class. The secret was to write unit tests beforehand. The tests took a day to write, then I could sail through the refactoring. As a bonus I now have some automated tests to use.

I'll also second what Philo pointed out. In quite a few places I noticed little gems that I would have had to learn the hard way if I had started again.

Matthew Lock
Saturday, April 24, 2004

You cannot refactor with confidence without unit tests. If you don't have the tests, then you tread too lightly, and go too slowly. It may sound counter-intuitive to someone who's never done it, but those unit tests let you be speedy and somewhat reckless, because you always have that green light backing you up.

Brad Wilson (dotnetguy.techieswithcats.com)
Saturday, April 24, 2004

Joel,

one thing I find troubling about the no-rewrites policy is that rewrites often aren't.

You start from a blank sheet of paper because you need to clear your head and do the architectural decisions right. That doesn't mean you can't borrow code and other implementation details from the old tree, where applicable.

A piece of code with no explanation at all and the developer since long left is a disaster waiting to happend. It is not an asset of the company.

Jonas B.
Saturday, April 24, 2004

More on the unit tests.... while refactoring I never actually needed to start the application being refactored - I just kept rerunning the unit tests. It's so much quicker to develop that way and like the other post said you don't need to tread lightly.

Matthew Lock
Saturday, April 24, 2004

I've done rewrites of significant chunks of code.  The code reached a point where it would be more difficult to add anything without breaking what was there.

They keys to making a rewrite work are:

- Don't replace the old code until and unless the rewritten code can exactly duplicate the functionality of the old, or all the discrepancies have been properly identified as bugs in the old code.

- Use a separate development environment for the rewrite.  Keep the old crappy code base alive until the rewrite is done.

- Don't replace the old code until and unless the new code is actually in better shape.

- To accomplish the above, you *gots* to have very good and massive regression tests.  And you need to regression test the old code against itself.  Sometimes the reason why old!=new is that the old code has bugs that cause it to produce inconsistencies with itself.

- Make enhancements only *after* the regression tests prove that old=new.  Ongoing maintenance changes should be made to the new only if they are also made to the old.

If the rewrite is a "must ship on this drop dead date", don't do it.  You have to have the option of delaying or throwing away the rewrite if it isn't obviously better than the old code.

T. Norman
Sunday, April 25, 2004

The second sentence should start "The old code reached a point..."

T. Norman
Sunday, April 25, 2004

What happens when the code works, but it's getting progressively worse?  The company I work for has an app with poor architecture.  Every new release results in slightly slower performance.  Since each new feature only adds an apparent neglient decrease in performance, the features can be justified.  However, all those .001 second performance decreases eventually add up. 

And now people are asking, "Why is the app so much slower now than it was 2 years ago?".  It didn't happen overnight.

hightequity
Sunday, April 25, 2004

I often refactor from the top down.  Replace what's in main() with something new and then fit all the functionality into some framework that makes sense.  This doesn't have to take very long, I refactored a 15,000 line watchdog program with 17 different "actions" in an afternoon after banging my head against the wall trying to add the 18th.

If on the other hand there's some fundamental data structure that is making everything difficult, refactoring that is going to be a complex and error prone activity.

One place where throw away and start over helps is with inhouse programs where 90%+ of the functionality is no longer used/important.  I did this with a schedule manager, replaced the old with a new with 5% of the original functionality, and then added in the 10% more that people complained were missing.  Much nicer (and smaller) program resulted.

Derek W
Sunday, April 25, 2004

Derek,

I think you've nailed it with the "functionality no longer required" comment!

For the past seven months I've been working on re-writing an old legacy application. (It is actually simultaneously a re-write and a port, since the application is going from an ancient C/C++/MFC hack that has been ported from UNIX to WinNT, and being re-done in Java - so strictly speaking it's not a classic "re-write").

The point is, this application has had mountains of features hacked onto it over the years, most of which are no longer used, or only used by a tiny base of customers trying to do something special. The architecture is horrendous and these features are embedded it in, making large "chunks" impossible to identify and rip out in any kind of major refactoring.

Doing a complete redesign, with elegant architecture and only the basic feature set implemented at first, will allow for this product to develop and grow properly on it's new platform. We could have spent months sifting through bug fixes for issues that were only a problem under some 1993 version of QNX, which at this point is a pointless excercise considering the feature that code comes under is probably no longer required.

Perhaps this situation doesn't quite fall under the classic "re-write" category since it's a functionality and platform direction change, but I'm surprised more people haven't written in with a similar story.

Adrian Planinc
Sunday, April 25, 2004

I have a question for all of you with experience doing unit tests.

What if the code is so horribly intermingled and full of global variables that it is nearly impossible to write a unit test without including the entire project?

Is there some magic bullet you have come up with or do you have to refactor just to start a reasonable unit test?

In the past, when trying to write unit tests for "ball of hair" projects, I ended up giving up because I could either write a trivial unit test that barely tested anything or the unit test was such a massive mess trying to set up all of the global varialbes and such that it was worse than the awful code.

Kyle
Sunday, April 25, 2004

Unit testing an already-written project is always going to be a bear, no matter how well it's written. I'd suggest you should unit test just the things you add while refactoring.

Ham Fisted
Sunday, April 25, 2004

My ball of hair was a web application, so I made my "unit tests" by just automating IE from perl and screen scraping. I guess they are strictly black box tests, but they were invaluable.

Matthew Lock
Sunday, April 25, 2004

>>
What if the code is so horribly intermingled and full of global variables that it is nearly impossible to write a unit test without including the entire project
>>

A trick that I've used to unit test was to write information to several logfiles based on functionality that represented various ouput before being written to screen/sent across the wire/Before and after leaving functions.  These acted as my 'base' results.  Be granular with functions you plan on re-writing.  Log all inputs at the beginning and outputs before exit.  Also make sure you log values before and after the function call


As you refactor, compare your output files with vdiff/diff.  Any differences, look at them closely.  If you determine your new file is 'correct'- this becomes your basefile.  This is especially helpful with untangling global variables.  Its tedious but the nice thing about this method is it is easy to compare text files and the are easy to version and go back if necessary.

Hope that helps.
Mike

MikeG
Monday, April 26, 2004

Honestly, you guys kill me. Why not just say it depends. I've spent 3/4 of my career working on code written by other people. Some of it can be refactored and some of it can't.  FWIK, Joel has not really written much code anyhow other than his own. I suggest folks work at small mom and pop shops that get code written from part time programmers or from Russia or India before you dismiss the re-write. Recently I had to add an item, one item, to a *shopping cart* and I had to change 6 files, at least 6 files. This kind of thing should be done from a web interface or some config file but it's not. Believe me, I don't think many of you understand the kind of code we're talking about when talking about a need for rewrite. However, I do admit that I consider refactoring more now. The the idea of the code gehtto and a nice interface while you work on moving code out of the ghetto is good, if you can do it.

must remain anonymous
Tuesday, April 27, 2004

"or the unit test was such a massive mess trying to set up all of the global varialbes and such that it was worse than the awful code."

Then it's not a unit, then. Unit tests are good, if you've got nicely defined and implemented units to begin with. If not, refactor until you do, then bring the unit test in.

Unit tests aren't a panacea - they mainly serve to validate that maintenance fixes don't break any existing expectations. Nothing like having ten people annoyed at you for breaking the build. Unit tests prevent such unpleasantness. But it is possible to ship buggy code, that has passed all its unit tests - believe me, I've seen it.

anonymous strikes again
Wednesday, April 28, 2004

I've never read the official canon on refactoring, so this stuff may be obvious.

But when I clean up code I basically have two kinds of activities :

* clustering (putting things that should be together, together)

* generalizing (making things parameterizable)

And I don't try to do the two at the same time.

Typically I start by trying to pull apart stuff which is just stuck together confusingly. For example, if a source file or an object seems to be doing several different weakly related things, I just split it into several smaller files / classes.

Often, at this stage I create files or classes called "XSystem" of which there's only meant to be one instance, and which does the X-related stuff.  Eventually these classes are going to go away again.

At that point, the XSystem object is still intertwined with other parts of the code. So then I start to try to put a more robust inteface around it. Functions which use values from global variables or other shared context, are changed to get those values from parameters. For each function changed, I try to track down all the calls to it, and change them, there and then. Sometimes the calling function doesn't have the necessary data. So at that point, I change the calling code to get it from the global or shared context.

So these are small changes, pushing references to globals back up the calling stack ... and testing the code still works each time. (Sometimes with unit tests, but that isn't always possible.)

As the code starts to be untangled, some more generalizations and possible parameterizations  become obvious. For example, you can often see where code from one place has been copied and modified in another. Now it's decontextualized, you can also see better how to replace both uses with a more general function or class.

This isn't a single pass process of course. Once you've made the quick-wins for generalization, it's back to more, finer-grained clustering and re-organizing. The XSystem can be broken into YSystem and ZSystem, and the whole thing re-iterates.

I'm scared of premature generalization, so I'll sometimes see a potential to pull stuff out into a shared base-class or strategy or delegate, but leave it  for a couple of iterations, until I feel I know the code and understand the domain better.

Usually, the time to do that generalization is very late. And trivially simple.

phil jones
Friday, May 07, 2004

Just noticed that the original questioner was talking about a web-app with currently a lot of business-logic in JSP pages. So here's a description of my current cleaning up of someone else's intranet application in ASP.

1) I started by pulling all the big, hard to read, pages apart into lots of little sub-pages, squirting these blocks of mixed code / HTML back into the main page with server-side #includes, which in this situation are a great boon.

2) Then I turned many of those sub-pages into procedural functions and supplimented the #includes with trailing calls to the functions.  And I embedded the HTML as response.write statements in the functions.

Step 3) was to parameterize those functions in order to remove references to the page's globals. And I also began pulling related functions that shared data together into classes.

Up until this point, I still didn't try to *generalize*.

But with the functions more cleanly encapsulated, I could start to discern certain typical components like tables and navigation bars and charts. I merged similar pieces together into more parameterized components; and polished them up a bit, including making the HTML embedded in the code more data-driven and flexible.

Now, at the present time, I still have all the code in VB-script. About 90% of the code is in separate #included files with more or less meaningful names, and 50% is in objects. In some applications it might be appropriate to move some of the classes into ActiveX, compiled server-side objects for speed, or access to other operating system services.

And I still have HTML embedded in code. I think that isn't a problem here, because it's very unlikely that any non-programmers are going to work on it in the forseeable future. But if you are moving classes "deeper" into the system, then you may need to separate the code from HTML. My advice is simply to try to do as much of this separation with style-sheets before you decide whether you really need to use a templating system or TagLibs. They're an extra level of pain to administrate, and half the time you *won't* be able to make the separation work. (If the project itself seriously demands specialist HTML designers, the chances are they won't understand or accept the restrictions of a tag system, and you'll end up working on the HTML through editing code anyway; so don't make life harder than it needs to be. )

Also, I advise not to worry too much about SQL in the pages. Queries are views on the data, and the chances are, you'll need to be able to evolve that quickly.

phil jones
Friday, May 07, 2004

Adrian's project sounds like a disaster waiting to happen. One of these unused functions is going to turn up to be critical to some apparently unrelated function and the application will start giving slightly different results for everyone.

Be careful!

MilesArcher
Monday, May 10, 2004

" EVERY method is a static method (i'm not kidding), "

Or maybe they're just into Prototype based languages :-)

phil jones
Thursday, May 13, 2004

*  Recent Topics

*  Fog Creek Home