Fog Creek Software
Discussion Board




Reversible Setup and Contracts

Joel makes the following statement in his recent comments about reversible setups:  "The account creation could fail for a myriad of reasons, none of which can be predicted before trying to create the account."

This is actually something that bothers me a lot about software that I interface with - the inability to predict whether an operation will succeed or fail for a given set of inputs.  Of course, there are unpredictable failures such as the network suddenly vanishing, but most failure modes should be indentifiable before an operation is attempted.

One common example I've seen is a problem with the Collection objects provided by Visual Basic.  The remove operation will generate an error if it is called with a key that is not present in the collection.  Unfortunately, the Collection class doesn't provide a means for testing this condition.  In other words, this design places a responsibility on the caller without providing the caller with the means to ensure that the responsibility has been met.  The typical work-arounds that I have seen are to either 1)  Iterate through all the keys in the collection, searching for the target in the calling code or 2)  Ignore the error that gets generated.  Obviously, both of these are sloppy hacks. 

In more general cases there may not be a workaround, so you are forced to use trial an error.  A side effect of this is that you have to write a lot of "undo" logic so that you can clean up side-effects of your trial and error session.

The best solution I've seen for this is to employ design by contract techniques, pioneered by Bertrand Meyer in his Eiffel language.  In the case above, the contract for the remove operation specifies a precondition that the key must be in the collection.  This forces the class to expose a query that allows callers to verify that the condition holds *before* invoking the remove operation.  This allows the client code to deal with problems before attempting the operation, obviating the need for much of the rollback logic.

It's clear that this sort of design goes a long way towards improving situations like the one Joel has faced with his setup program, so I am wondering why this approach is not more evident in the APIs we program against?  Any ideas?

Note that this kind of approach has the added benefit of making the semantics of each operation much clearer since properly using it shows the conditions under which it is valid to invoke the operation, the conditions which will exist after the invocation, and by implication the change in (observable) state experienced by the object being operated upon.

Kind of cool, no?

General Contractor
Tuesday, October 08, 2002

P.S.  Anyone notice the eerie similarity between the problem mentioned above and the one-way flow of responsibility that your managers continuously try to shoulder you with?

General Contractor
Tuesday, October 08, 2002

I know what you mean, I don't agree with the collection key example, because to me the key is the lowest level of identifying a collection member, the buck stops with the key, if you are trying to remove a non exixtent key you have a problem. Ok, so the key exists, if you've got to check that, how can you be sure its the right key? The argument goes on and on. The key is the lowest level, it is as close as you can get to certainty.

Alberto
Tuesday, October 08, 2002

Design by contract works well at low levels in a system, but gets fuzzier and fuzzier as you get to higher levels, particularly when parallelism and/or distributed systems are concerned.

Consider for example the creation of a userid, as Joel mentions. The pre-conditions are something like:

1. The specified account must not exist.
2. The specified domain must exist.
3. The password must satisfy the site's password strength rules.
4. The calling process must have permission to create a new user.
5. The domain controller must be up.

All well and good. It might even be possible to programmatically test each of these pre-conditions. But the real kicker is that each of these must remain true after you've tested them, up until the creation has succeeded. While you might be able to come up with technology to force 1-4 to remain true, you can't guarantee that a computer won't crash. (Someone like me might trip over the power cord, resulting is spilling my coffee into the UPS.)

So, from a practical point of view, most distributed systems eventually boil down to APIs that make a good faith effort to accomplish something, but don't make any promises. In particular, they almost never provide an exhaustive list of reasons why they might fail. And if they do provide such a list, it will be obsolete in the next release.

However, a much more interesting question about setup programs is: What do you do when your attempt to undo somthing fails? (That is, we created the account, something downstream failed, and we discover we can't delete the account.) This is where life gets tenuous.

Think about the user experience of a hypothetical FogBugz installation. In step 1, Setup stopped IIS. In steps 2-10, it perfectly installs FogBugz. In step 11, it tries to start IIS and fails. Because a failure occurred, it backs out the installation. When it gets to backing out step 1, it tries to start IIS again, and fails again.

In general, there is no way out of these situations. Which is why I always get nervous when I see "Setup is updating your system", regardless of how well-written the setup program is.

Jim Lyon
Tuesday, October 08, 2002

The problem with General Contractor's example (Alberto touches on this) is that the key is the fundamental element of iteration through the Collection.  It's akin to having arrays warn you if your index is out of range.  That's the job of the programmer, not the data structure.  VB  is doing the right thing, IMO.

That said, I'd be very suprised if there weren't a method to check to see that a key exists in a collection (I'm not familiar with VB.  I'm extrapolating for Java Collections Framework and STL ).  You'd use this method to check for the existence of the key before doing something that'll change the state of the structure (like removing a member).  For such a heavily used library, that would seem like a pretty harsh shortcoming.

Crimson
Tuesday, October 08, 2002

Jim,

The scenarios you presented are unlikely, but I suppose they are possible.  You can't protect against absolutely everything.

At some point, I think you have to cross your fingers and hope for the best.  :)

Crimson
Tuesday, October 08, 2002

Crimson,

VB collections do not, in fact, have a way to check if a particular key is in the collection. Yes, it sucks big time.

Others,

The big problem with the "test first, then do" philosophy is that it breaks down in multithreaded/distributed circumstances.  Just because it came back ok at the time I called the test function doesn't mean that it'll still be true at the time I actually try to make the change.

The "real" solution is to try to enforce transactional commit/rollback semantics. To get that to work, though, will require that sort of transaction to be baked into every system at the very lowest level. Not really practical, I'm afraid.

Looks to me like we're stuck with "try it, and be prepared to fail".

Chris Tavares
Tuesday, October 08, 2002

I have run into this with ADSI interface calls where you have to insert a On Error Resume Next statement before your call (so you don't trigger an exception), check the return value and then handle the error, or continue if everything is okay.  Don't forget to put a On Error GoTo after you have done this....

Tim
Tuesday, October 08, 2002

A software engineer, a hardware engineer, and a manager are driving somewhere when the brakes fail on their car. Luckily, they manage to get the car to the side of the road and stopped where they get out and ponder what to do. The hardware engineer says “I think I can disassemble the braking system to figure out what went wrong.” The manager says “No, we should schedule a meeting and write a proposal.” The software engineer says “The hell with that! Let’s just get back on the freeway and see if it happens again!”

pUnk
Tuesday, October 08, 2002

when you try  item(key).Remove and it fails (because for some bizarre reason the key doesn't exist) a good error handler is the standard fallback position. In the example you quote where you try item(non-existant key).Remove (or you cleverly figure out that the key does not exist before trying the remove) what do you do then? Either way you have an error.

Alberto
Wednesday, October 09, 2002

In the end it all falls back to the 'your call is valuable to us, you are number....continuous hummm' pattern. 

After you've put the phone down, and you redial you go through a diagnostic (eventually) with whatever connection you get at the other end to determine how much detritus of the previous call was left and then make rational judgements as to how to negotiate your information past the connection and into a form that the service at the other side recognises.

Or you wait 30 days, read the statement and then complain that your incomplete call meant that they increased your direct debit to 100 pounds a day.

This is why setup programs are complicated to write so that they seem simple.

Simon P. Lucy
Wednesday, October 09, 2002

Does anyone actually use vanilla Collections in VB?

The VB collection has to be one of the worst thought out data structures around.

The proper way to deal with Collections (ok, my way, at least), is to wrap the collection in a class that deals with all of the error handling for you. Returns a Nothing reference if the key does not exist. Has different methods for Item and Index (so that you can use a long as a key). Has collection item initialisation method. Has a Clear method, that clears the collection. Provides several methods for iteration.

This way, all of your collections look the same. And if, for some reason, you decide to implement the data structure with arrays and hash keys, then it looks exactly the same to the rest of your program.

One of the things that I have learned about computers, is that they are unreliable and unpredictable. You can never know how they will fail, and they will always choose to fail in that 3 pico second window that cause the most grief.

This is why contracts are doomed to failure. They presume that all parties are willing and able to abide by them. The only problem, is that no one asks the computer if it is willing to abide by the contracts.

Evan
Wednesday, October 09, 2002

Please excuse the use of the VB Collections example.  It was intended to illustrate a point, but apparently, I've failed to communicate it clearly.

Alberto:  The sort of code I was thinking of has the form myCollection.Remove(someKey).  In other words, I want to remove the item associated with a given key.

Crimson:  The collection class does not, in fact, have a way to test for the presence of a key in the collection (this is why I chose this example).  This is akin to *not being able* to test whether your array index is out of bounds before attempting an access.

Jim:  Excellent points, one and all.  I was trying not so much to advocate contracts (though I do), but to recommend a design philosophy compatible with them.  I don't mind having to deal with the failure of an operation.  As you astutely observe, no matter what precautions we take, failure will always remain as a possible outcome of attempting an operation.  This is especially true in complex environments like the ones you mention. 

What I do mind is having to rely on trial and error to detect *all* failure modes, especially semantic violations.  There are many failure conditions which *can* be detected a priori, but APIs often do not allow for the detection of even the simplest of these, like the VB Collections example I mentioned.  So, what I'm really driving at is using contracts to detect failure, not to guarantee success.  See what I mean? 

The contract preconditions specify what must be true so that there is at least a chance of success - it represents a lower bound.  We know that if the preconditions do not hold, then the operation will definitely fail.  Why not equip me to detect at least this much without having to attempt the operation?

Evan:  I agree.  The VB Collection class is a poster child for poor design. 

General Contractor
Wednesday, October 09, 2002

Chris wrote:

[The "real" solution is to try to enforce transactional commit/rollback semantics. To get that to work, though, will require that sort of transaction to be baked into every system at the very lowest level. Not really practical, I'm afraid.]

The setup project (setup1.vbp) that comes with Visual Basic uses a transactional method to roll stuff back. I haven't used it extensively, but it might be a good starting point for anyone trying to understand or implement reversable setups.

BTW - does the freely available (and very good by all acccounts) InnoSetup do a good job of being able to undo its actions? I haven't used it much.

Brad

Brad Thomas
Wednesday, October 16, 2002

*  Recent Topics

*  Fog Creek Home