Fog Creek Software
Discussion Board




A Design problem

Hi, wondered if any of you smart fellows out there might have an idea of how I should/could design a component of my n-tier system.

This part of the system works such that a user supplies some data and one or more files to a client application.

The client application calls a stateless middle-tier server component (think DCOM or .NET remoting single-call objects) and this component validates the data the user supplied, stores the data on a database and stores the file(s) supplied in the server's file structure.

I want to make this operation as atomic as possible (although I accept that storing the files will be outside of the  main database transaction - unless someone knows how to tie these together in a Window environment).

What concerns me more is how to time the actions; I can foresee a problem where if the data and file(s) are initially sent across the network (relatively slow operation for larger files) and then the data fails validation then we've just sent a load of data across the network for no good reason (and will have to redo it all again).

An alternative? Send the data only, the server creates a transaction, validates, updates and prior to commit drags the file(s) across. Well this is going to cause poor performance if the transactions do not time out in the meantime.

Another alternative? Send the data, the server validates the data. If successful then we're 99.something% likely to succeed so we send the data again but with the file(s) this time. The server starts a transaction, validates and updates the data, and somehow commits both the file updates and the database changes together.

This last alternative seems the most sensible method but  involves two calls, one as a dummy run which any performance conscious designer would want to avoid.

Anyone seen this before or got any ideas?

For a parallel system think Continuus/CM Synergy or PVCS Dimensions. These both have to synchronise the uploading and storage of meta-data and files.

The development language is Windows .NET although this should have no impact on the solution.

Thanks in advance

Gwyn
Wednesday, January 29, 2003

Do The Simplest Thing That Could Possibly Work.

I'd just transfer the data and file to the server, and then do the data validation. It simplifies your code tremendously, and eliminates a large set of failure modes in one fell swoop.

Don't optimize without measurement! Try it the simple way first, get it working, and only then worry about how fast or slow it might be. It's usually fast enough.

Chris Tavares
Wednesday, January 29, 2003

Thanks for your reply.

I am inclined to agree with you but on the other hand I'm reluctant to implement an architecture I know is flawed where foresight would enable me to get it right first time.

I've seen too many products that are now doomed because the architectural flaws have come to light but it's way too far down the line to start changing the basic architecture of the product; it'd just be too costly. The product will never recover.

Gwyn
Wednesday, January 29, 2003

Then you need to properly factor your architecture so that this kind of change CAN be done cheaply.

After all, you only need this in two places, right? The "write to database" module on the client and the "recieve update" module on the server. If these two modules are properly built, the rest of the system shouldn't have to care about how or when the database is updated or in what order.

If you can't do that, you might want to rethink more than just your database update strategy.

Chris Tavares
Wednesday, January 29, 2003

Send the files first.  Store in a transient position.

Then send the Data and execute the transaction.

If the transaction fails, allow it to be restarted with the files in the temp location.

Adam Young
Wednesday, January 29, 2003

If that's what you really want, you must treat this transaction as a distributed one with multiple phases.

Then you have to resolve all the synchronization between phases (send the data first, send the files second) and make sure to keep things atomic.

And things tend to get complicated:
1) you have to treat this as an atomic request-response transaction and that means you have to have a stateful super-transaction around all this, with and id passed along with actual data in each actual phase
2) resolve all the state synchronization between phases
3) put timeout guards around the super-transaction such that you don't get locked in between phases (both client and server)

So, you see, a simple thing gets soooo complicated that I would wonder if that's a price I want to pay. On top you don't really know if your problem is indeed a real bottleneck to the system.

Here's what I would do in this situation:
- wrap this transaction such that I can modify it later without major code changes.
- implement it the simplest way
- figure out if there is a performance problem or not
- if (and only if) there a performance problem, then I would go the complicated way.

Another thing, the multiple phase transaction looks out of proportion (too complicated for what it does) so I wonder if this is a good solution anyways. I would spend more time thinking about it.  Try "Fire and Motion" ...

Ciao
D

Dino
Thursday, January 30, 2003

You haven't said exactly what you mean by "validation", so I don't know if anything could be done on the client side or not as an added safety measure (not as a replacement for what the server does).


Here's how I would do it:

Client receives user information, then validates the data and the files as much as it can. Insures it's in the proper format, the file isn't broken, etc, as much as it can.

THIS STEP MAY NOT BE NECCESSSARY FOR YOU: The client then sends the client data only to the server, assuming this isn't sufficiently resource intensive to cause a problem, and the server validates it and sends back an OK or an error to the client. This step can fail or be removed, because it is only a safety measure to avoid wasted resources in transferring files.

The client then sends both the file AND THE DATA at the same time to the server, and the server validates everything (both data and file) and then stores them (if you "check then save, check then save" that allows for there needing to possibly be some clean-up, which is usually BAD, depending upon how the rest of your system is designed - the system can be designed to where this just doesn't matter).

Then it sends back an OK to the client, and you are all done.


The key here is that all that is being done that is not absolutely neccessary (redundant - note that redundancy is not neccessarily bad, which is why humans have two kidneys and an oversized regenerating liver by default) is to minimize the chance of files being transfered wastefully or data/files being stored improperly or unneccessarily.

And all this without sacrifing any degree of security.


This is the best way I can imagine of handling this, without knowing the nitty gritty details about what exactly the data and files are, what they are being validated for, or how the rest of the system accesses the database.

It is also only 1 step beyond The Simplest That Could Possibly work, and it's just a very simple to do optimization which takes little time and is not likely to harm the system.

I think this does what you are wanting to do. Hope this is helpful, and good luck :)

Brian Hall
Wednesday, February 05, 2003

Thanks guys

Gwyn
Saturday, February 08, 2003

*  Recent Topics

*  Fog Creek Home