Fog Creek Software
Discussion Board

To refactor, or not to refactor...

Is there ever a time not to refactor?  That a codebase is so dingy, so dirty, so horrible that every time you read it you get in a really bad attitude, that you should start from scratch?  Let me preface this by saying I've read Joel's articles on refactoring.

I think its a good idea, most of the time.  But I think that I've come across a project that it might not be the best way to go about it.
Here is the short rundown: It was written around 1996 through 97 (actually, only a few months in those years).  It is about 30000 lines of perl (mostly duplicated) over 160 cgi scripts.  It doesn't use a database, rather it uses the filesystem, and about 30 undocumented files to store the data (I've deciphered quite a few).  It has its own templating system, based on the order of the arguments passed to the template function (i.e. the first argument is #1#, the second #2#, etc).  It doesn't use my(), only local() (for those of you not familiar with perl, think all globals, all dynamically scoped), and there is quite a bit more that I could go on about as well.  Oh, did I mention it doesn't use strict, warnings, or taint mode?
Its an old Content Management System.  I've written a good part of one before so I have a good idea of what I want and don't want.  What I want to do is put this whole system into mod_perl, with a database, following the Model-View-Controller paradigm.  Now I've mapped out my plan for refactoring already, but I noticed in maintaining this and looking at it to plan the refactoring that all of the scripts are pretty much one function.  So they have a pretty well-defined purpose in life.  I'm convinced I could just take the functionality it needs and implement it in mod_perl.  Nothing is complex, its all 'get data from form, process, put into database'.
Now maybe this is all me rationalizing rewriting it myself, but I've thought a lot about this, and think rewriting it clean might be the best way.
Lastly, to give you an idea of this project, it was written mainly by a guy that thought it was a good idea to put the perl binary in cgi-bin webspace so he didn't have to type '#!/usr/bin/perl' at the top of his scripts, only '#!perl'.  There are similarly well-thought out things throughout...
So what should I do?  Have any of you come across projects that you just shouldn't refactor, because of large architectural changes or other reasons?  Thanks for the input...

Frustrated Perl Programmer
Wednesday, November 27, 2002

if it doesn't do much of anything already, throw it out.
I am involved with a project now where some guys took about 2 years to implement 1/20th the functionality of "Tomcat," with no comments and no documentation. I convinced my boss to chuck the original pile, and start anew. 3 weeks into it, all exitisting features were implemented, along with 80% of the actual application level features. The whole damn thing is going to take about 4 months, MAX.  I think Joel's advice only works if your original crufty codebase already does 80-90% of what you need. If it does < 20%, and it sucks, get rid of it.

software engineer
Wednesday, November 27, 2002

well, it mostly (80%?) does everything it needs.  Its just a real pain to modify anything.  and I mean *real* pain.  Want to the main file format?  Cool, but *every* search query will have to be re-written.  And its not so much a query, as a entry in a file, where each line looks like:

<filename>|perl expression that returns 1 or 0|Search title|morestuff

And in that line, you can have the template substitution I talk about before (with #1#, #2#).  Also, !N! means the N entry in the filename given in the line, by a pattern match given in the morestuff.  Not every file has it all filled out, so this is by memory.

So the problem isn't that it isn't finished.  It could be used in its current state just fine -- as long as you never need any changes. It actually has some neat features.  But, to make any changes takes so overwhelmingly long, that I don't think its worth nursing this code anymore.  Case in point:  You want to add a few items to the survey feature in this site, a few more questions, thats it.  Easy?  No, undocumented file, with assumed positions in the file, etc etc.  About 4 hours of work.  Should be less than 5 minutes with a web form.

Frustrated Perl Programmer
Wednesday, November 27, 2002

when I say "not every file has it filled out, so this is by memory" above, I meant that I was looking at an older version of the file (there are multiple versions of the files in this) that didn't have all the whiz-bang features used.  So I was describing it by memory :)

Frustrated Perl Programmer
Wednesday, November 27, 2002

Ironically, the Quote of the Day on is

Refactoring improves the design. What is the business case of good design? To me, it's that you can make changes to the software more easily in the future.

Refactoring is about saying, "Let's restructure this system in order to make it easier to change it." The corollary is that it's pointless to refactor a system you will never change, because you'll never get a payback. But if you will be changing the system‹either to fix bugs or add features‹keeping the system well factored or making it better factored will give you a payback as you make those changes.

--Martin Fowler

Wednesday, November 27, 2002

Fix what offends the eye and fix it in a way that doesn't break everything else.

Repeat until glorified.

Simon Lucy
Thursday, November 28, 2002

Sounds like you have a Big Ball of Mud ( my friend :-)

I'm not sure I understand exactly what you're saying though. Do you mean that every time the database (files) are used, the code to access them is inlined in the application code? If so then that's an excellent place to start your refactoring if you want to change to a db system. Write a DatabaseThing which takes those "queries" and translates them to queries to the files and replace the application layer file access code with calls to your Thing (as Joel suggested, that latter can probably be done with a macro in your favouite editor).

Cool. Now if you want to use a db instead of induhvidual files you simply change the DatabaseThing so that it translates the old style queries to SQL (or whatever), and replace your file based system with a proper db, assuming this is what you want to do. Sounds easy if you say it quickly ;-) (Incidentally, I once worked on a project where we changed from a noddy db system to a proper one in one day, precisely because we had this type of setup.) Then you can refactor again so that that application code doesn't know about which files/tables the data is in. Then you can refactor the db so that it is fast and/or normalised.

As Simon Lucy suggested, if the benefits warrant it then eventually you have refactored everything, but at each stage you have had a fully working system. When you should test it is really up to you, but after each refactoring stage is probably a good idea, then changes can be released frequently and hopefully whoever is using it can see the benefits as it evolves (either that or they see there are no benefits, in which case you will be the one on benefits :-)

Someone who is currently rewriting something from scratch *sigh*
Thursday, November 28, 2002

I liked the last answer, but I'd add that to refactor safely you need tests.

So before you touch anything right some tests that ensures (as much as possible) that the functionality doesn't change. Then writing the facade stuff will be safe, because you can feel comfortable that nothing has changed.

These tests will boost your confidence again when you change the "backend" of facade to the db. Again you'll be certain that you're getting back what you expect.

I believe all this crosses over to the questions of business value and "You Aren't Going To Need It". I believe that current best practice would suggest that if you have to touch something to add business value, write the tests, refactor, and leave everything else alone until you have a need.

That way you've provided business value, and you've improved the code, and you've kept the system going.

Daniel Berlinger
Thursday, November 28, 2002

Basically, our rule of thumb is:

If the code doesn't bother you, don't bother it.

Everyone has some crufty code somewhere. But if that crufty code isn't generating any bugs or doesn't need to be extended in any way, then for god's sake don't mess with it.

On the other hand, if you are being asked to extend the code, then it's open season on refactoring.

Bruce Rennie
Thursday, November 28, 2002

Forgot to mention: I ABSOLUTELY agree with the person above who stated that in order to refactor properly you have to have unit tests.

No unit tests, then tread with EXTREME caution when refactoring.

Bruce Rennie
Thursday, November 28, 2002

Just out of curiosity: What is your estimate for a complete rewrite of this system?

Just me (Sir to you)
Friday, November 29, 2002

In my opinion, there is a point at which you've crossed the line and have to rewrite rather than refactor.  That is when the code is really really bad, hard to change, done all wrong, and most importantly doesn't even come close to meeting its requirements.  The only time you don't keep anything is when there is nothing worth keeping.  It sounds like what you have isn't as bad as all that.

If it would take more time to rewrite than refactor, I'd say you don't have a case for rewriting.

Keith Wright
Friday, November 29, 2002

Estimate for a total rewrite, about 3 months.  The software doesn't fall into the "doesn't do its job" category.  It falls into the "does its job, but very brittle and resistant to change".  Even to just change a template.

I see how unit tests would be a good idea.  I think the first thing I'm going to have to do is get this software easy to modify.  i.e. get it all to run under strict mode in perl, with warnings on.  Then, I'll start with the db/filesystem interface, then template interface, etc.

Thanks for the input, I think i'm going to try to refactor!

Frustrated Perl Programmer
Friday, November 29, 2002

Rewrite the stinking thing if your boss will let you.  If the code is mostly garbage and you have to change more than 25% of it, you're probably better off rewriting it.

I've refactored and rewritten thousands of lines of code myself, and let me tell you, good code mixed with crappy code ultimately becomes crappy again unless fight VERY HARD to keep the clean parts clean.  It's like placing a bucket of sewage beside a bucket of perfume.  The sewage will always win.

After a rewrite you'll probably find that the code size has dropped by more than 60%, and many changes that used to take hours and even days of coding and testing will drop down to a few minutes.

T. Norman
Friday, November 29, 2002

You should also consider whether refactoring will get you to where you want to be. If fundamental design is flawed, then it's going to be very difficult to fix by refactoring alone. (When I say "flawed", I mean that it cannot accomodate required changes without a large redesign).

Also, try adding up the amount of time it will take to implement any upcoming modifications with the code in it's current state, with the code refactored and with the code rewritten.

If the rewrite will take longer than refactoring, but will significantly reduce the amount of time to make any future mods, then maybe a rewrite is justified.

Saturday, November 30, 2002

I would say: do not throw it away all at once. Even if you consider it poorly done and you do not want to live with it, at least it does a major part of the work it is supposed to do. Start rewriting, if you like, but try to start on one end (lets say only the "get data from form" part) and leave the rest as it is to begin with. I know this normally means coming up with a stupid and ugly wrapper interface between the old and the new part, but it might be a good alternative to either a complete start from scratch or a "classic" refactoring.

Of course this is only possible, if the old code is modular enough to do something like this.

Have fun,

Jutta Jordans
Tuesday, December 3, 2002

Modular. ha.  Everything is cut/paste. The only modularized stuff does the key substitution in the templates (based on parameter order, blech) and reads in entries from files (but the caller has to know the file format and name).

I still think its possible though, its just going to be a lot of work.

Frustrated Perl Programmer
Tuesday, December 3, 2002

So, which path was chosen, and what happened?

I have a similar case.

Andre Mueningoff
Wednesday, February 11, 2004

*  Recent Topics

*  Fog Creek Home