Fog Creek Software
Discussion Board




What would you do?

Totally hypothetical (cough) situation here, how would you handle it if you were in charge?

1 - Dev group decides the main work machine needs a reinstall for whatever reason (not important).

2 - All developers clear the machine of their stuff.  They all declare that they have nothing on the machine that can't be recovered from backups.

3 - After the reinstall it is discovered that one developer has never backed up his project to any machine other than the one he's working on.  3 months of work are lost and the client wants their project.  All backups were made to the same partition they were done from.  This was done by your most senior programmer.  Even without the reinstall this would have been a problem had the machine crashed unexpectedly one night.


What would you do about it in terms of:

1 - Getting the project back on task.
2 - Employee related actions.
3 - Making sure it doesn't happen again.

-Anonne

Anonne
Monday, February 25, 2002

We don't speak to each other except via our VCS system (VSS or ClearCase or whatever). We don't send each other builds via email, nor do QA copy files from a developer's machine. If it doesn't exist in the VCS database then it doesn't exist at all.

Now that you have a single thing (the VCS database) on one machine (the VCS server) it's easier to know what needs backing up.

Good thing that it was only a few months lost! You might survive. Our VCS contains years-worth of development.

Christopher Wells
Monday, February 25, 2002

Yeah, it looks like somebody really boneheaded that one :)
Or somebody is just making the things up to create a long thread.

You should back up the development machine before a rebuild . This way if a reinstall does not go well, you can do a rollback quickly and without trying to find all the developers. This is in addition to source control of course.

Assign a single person to be responsible for the total well-being of the build, that way you'll know who should be punished next time.

As far as punishment goes, I would not suggest firing the developer who did not have a copy of the code, because he/she after this burn will be more careful with code than anyone else. Firing the senior developer and/or the sysop should not be out of the question, because if they did not learn some things earlier in their career, they probably are not worthy of their position.

Michael K
Monday, February 25, 2002

By 'employee related actions' I assume you're expecting somebody to say "sack the idiot who didn't do backups'.

So... "sack the idiot in charge of this mess".

Because any company with a culture like that is even worse than the one I'm stuck with, and that's pretty amazing.

DB
Monday, February 25, 2002

1) CVS
2)AMANDA (Tape Back up system)
3) Staging server/ Workstations
4) Automated build scripts (ANT, Make, etc)

Basic part of any companies dev system.

Adam
Monday, February 25, 2002

Easy, spend the whopping $100-200 to simply buy a new drive for the box, and file the old one on the shelf.  Drives are so cheap that wiping one to save it is rarely worth the money.  Sounds like it was especially not worth it in your case.

It also sounds like you're way too dependent on your developers to manage their own source, which is a waste of their time, and pretty much guarantees you'll have a different management method of each developer.  You're also not using any central source code control system, or there would be historical backups.

The bonehead in this case is the manager, not the developers.  Any blame for this needs to be laid at their feet, not the "senior" person that failed to back up their own stuff "correctly". 

James Montebello
Monday, February 25, 2002

Reduce the hourly rate of the person who lost the work.

Tell them that they are not up to speed yet.

Tony
Monday, February 25, 2002

Worse things have happened for better reasons.

1) Short term project:
It will probably take 1.5 months to reimplement (for free), if the client doesn't give the middle finger.

2) Medium term asskicking:
Well, who's really at fault?  There is not enough information to locate the blame.  And blame is secondary to locating the "problem."  I can imagine a dumb senior developer, as easily as I can imagine a smart one that is just so harried by bad management that a fubar is bound to occur.

Don't you think the morale would be incredible if this problem would be solved intelligently, by fixing the faulty system?  Only if the guy is an overall dumbass should something bad happen to him personally.

3) Long term soul-searching:
Yeah, you betta search them souls.  Get back to basics.  How should things be stored?  Do the sysadmins have an automated way of setting a user up, hopefully with drive images?  How fast can a user get back into a project after a catastrophic computer failure?

Soul-searching.  Soul-searching.  Soul-searching. 

And if you're from Microsoft, DEVELOPERS!!!

Sammy
Monday, February 25, 2002

James-

What do you mean "depending on developers to manage their own source" ?

In my experience, developers are responsible for checking in their code regularly. Prerequisites are that the code compiles (you'd be surprised), has gotten some cursory testing, and doesn't screw anyone else up.  The last is a good-faith thing; sometimes it happens. 

Generally it is to your advantage to do this often, as you must integrate before you commit, and it's easier to manage small batches of changes.  It is rare to go a week without committing. I am trying to imagine going three months.

Is this so rare? Why would it be someone else's responsibility? 

tangram
Monday, February 25, 2002

On managing source:

There should be an agreed upon (at least group wide) set of procedures, up to and including source code control, for how source should be managed.  There's nothing worse than having a developer out of the office and being unable to figure out what version of the source is what.  What's just being hacked on, v. what's supposed to be relied upon.  If you delegate this to N people, there will be N different ways this is done.

Also, managing the disk files (as opposed to their actual contents) should be the responsibility of an admin.  Backups, ensuring there's enough space, etc.

Testing, code correctness, etc., is still the responsibility of the developers.

James Montebello
Monday, February 25, 2002

Ha ha, I liked that one, reminded me of a really ugly/scary situation that happened to me, which taught me to always use real source code control in my projects.

About four years ago, I was working on my first Java project, and I was pretty much on my own regarding most things, as I was developing a standalone client app.
I had access to Visual Sourcesafe, but for some strange reason, I did not bother to use it.
What I instead did, was that I simply backupped all the source code from my machine to another machine which got backupped every two weeks or so (yes, very irresponsable, I know that now).

Then, one evening, before leaving for the day, I was going to perform my backup, accessed my project folder on the backupped machine thru the network, and deleted all the source files there, in order to replace them with the new ones afterwards.
But strangely enough, when I was going to paste the new source code to the backup folder, somehow it did not work.
And when I looked at my local folder, everything was gone.

What had happened was that I had accidentally accessed my local folder thru the network, instead of the one with the backups, and deleted all the work I had done in the previous two weeks. And I had deleted everything directly, without storing it on the recycle bin, so I was kind of fried.

Thankfully enough, I still had all the Java class files lying around in another folder, and managed to generate all the source code for my application by using a Java class decompiler. Still, I spent about a day and a half rewriting all the source code comments, but at least it was better than having to spend more time rewriting everything again.

Needless to say, this taught me to always use source code control on my projects.

Gabriel Lima
Monday, February 25, 2002

1. No Ideas. Re-write lost projects.

2. Fire manager who failed to establish required procedures.

3. I would do these:

a) Make sure that your company work does not depends in on person's habits. Assign several developers to one project, pair programmers, move people around. This will force them to use version control system.

b) Do automated daily builds. In this case build system will have to get most recent version of code from somewhere, so probably this will force you/developer to use vcs.

c) Make vcs backups an automated routine. Assign a peson to do them. We have sysadmin for this. Make Off-site backups weekly (here I mean burn a CD and take it home).

Roman Eremin
Tuesday, February 26, 2002

Maybe it doesn't scale, but I finally gave up on tape backups as being uneconomical, given the relative prices of hard drives and big tape drives these days. I just bought a second hard drive and mirrored the drive where I keep source code. Cheap insurance. Yeah, there are single points of failure (disk controller, power supply), but they take the data offline without destroying it.

Mike Gunderloy
Tuesday, February 26, 2002

Mike wrote:
"I just bought a second hard drive and mirrored the drive where I keep source code. Cheap insurance. Yeah, there are single points of failure (disk controller, power supply), but they take the data offline without destroying it."

Good solution for daily backups. However, what if the computer gets lost in a fire, gets stolen, is hosed with water. Not that this would happen often, but I know someone whose company building burned down. They went bankrupt because there was no off site backup. With off site backups they would have been back in business within weeks.

It's all in the risk you are willing to take and depending how important your files are, I would recommend everyone to make at least a weekly off site backup. It's a small cost for some peace of mind.

Jan Derk
Tuesday, February 26, 2002

And to stay on topic:

"1 - Getting the project back on task."
Take your loss and get started asap. Not much else you can do.

"2 - Employee related actions."
It's all in the employee's response. IMHO people are allowed to make mistakes even the really disastrous ones. I don't know many people who have never messed up big time. The important aspect is: As long as they learn from it.

If this incident clearly scared the sh*t out of this employee and it shows that he learned from it I would not do anything, but accept the apologies and maybe comfort him. If the employee shrugs this action off as irrelevant and even after explanation does not see where he went wrong: Fire the bastard. Pity the world is not always black and white.

And even more important as someone else notices: Make sure you get to the one responsible for not having source code control and a good backup plan. Could that be you? (Just teasing ;) Same approach. Learning := OK; Not Learning := Fired. Gray area := I Don't know;

"3 - Making sure it doesn't happen again."
Like the others say:
- Get good source code control.
- Get a backup plan.


There are no disasters, just learning experiences. It's easy to say from a distance, but if this incident causes your department to get better organized, it's just that.

Jan Derk
Tuesday, February 26, 2002

"...main work machine..."

???

Everyone has to use the same PC?  Isn't that taking pair programming a little too far?

Guy Incognito
Sunday, March 03, 2002

*  Recent Topics

*  Fog Creek Home