Fog Creek Software
Discussion Board




Decoupling Request and Response

I am working on a new application and so I am currently experimenting with different approaches to solving my problem.  Very generally, the problem is to let the user enter some information on a web page, press a button, and get some results back.  Very familiar territory, I'm sure.  There's a few twists, though, so I am trying different stuff to see what works best.

One approach that seems promising would have several clients put their request into a queue which is then serviced by a shared analysis engine, which produces a result that can be formatted and returned to the user.  The problem is I don't know how to: 1)  get the ASPX page to wait while the analysis engine services its request, 2)  have the engine get the results back to the ASPX page, and 3)  make the analysis engine stay alive all the time.

I've come up with all kinds of approaches, but each is complicated and messy, so I'd like to see if there is something simple I could do so that I can try this out.  Anyone have any ideas?

I'm using C#/ASPX/SQL Server, if that's important.  Thanks in advance.

three-tiered newbie
Friday, September 12, 2003

You could run batch jobs like this.. and get a collective answer back for multiple clients.. the only problem is that you have to program your web flow so that it is part of the form filling to keep going back asking "are we there yet?"

You could have a simple javascript/vbscript on the clientside that polls using client side XMLHTTP with a job id.. so example: given AA-ZZ concurrent outstanding requests you created a couple dozen batch jobs..

each job is id'ed 1-24 .. the twenty four batch jobs.. will give back a done into a table..

you have to have a simple xml asp page that polls this table to see if

qry = "
SELECT TOP 1 DateAdded,
FROM tblBatchJobs
ORDER BY DateAdded DESC
WHERE JOBID = Request("JobID")
AND DatePart(hh,DateAdded, getdate())< 2
"
If RS.EOF
' Either batch job is older than 2 hours old or job never existed...
Else
Job completed..
End If

On the client side.. once the job is confirmed to exist (server will say something like the following)
<JOB>
  <ID>5</ID>
  <Completed/>
</JOB>
you can redirect the user forward in the form flow..

but I really think you should just use email notification for anything that requires the user to wait more than 120 seconds.

-- David

Li-fan Chen
Friday, September 12, 2003

Sometimes I get this nagging feeling I am doing homework for IT freshmen...

Li-fan Chen
Friday, September 12, 2003

Are you CPU bound or IO bound?  That would effect your design.  If it is all database oriented what is wrong with using ADO on the server?  I've worked on CPU bound (calculations and charting) web based systems and ended up implementing a custom queueing solution.  But this was a few years back.  The tools might be better today, but somehow I doubt it.

christopher baus
Friday, September 12, 2003

Another question.  Is the analysis complicated enough that it requires dedicated hardware?  In my opinion this starts to get non-trival...  My guess is trying to use something off the shelf will involve myriads of work arounds. 

christopher baus
Friday, September 12, 2003

Thanks for the responses so far. 

What's plaguing me is that each computation requires a lot of highly structured data to complete.  I could probably write a giant mess in the database to get the results, but I'd prefer to be able to load the large data structures when the system comes up and then just leave them there while clients query against them, so that I can take advantage of the structuring capabilities of objects in the middle tier.  Loading all this data each time a request is made doesn't seem to be an option.

I think the comment about being a freshman seemed kind of unnecessary.  If you have reservations about offering advice, feel free to withold it.  I was hoping more for a discussion among colleagues than a give me the answer kind of thing.

three-tiered newbie
Friday, September 12, 2003

I don't think it will be a problem to run the calculation engine on the same machine as the web server. 

three-tiered newbie
Friday, September 12, 2003

Hey, I am sorry to offend you. It really didn't seem all that complex at first, and as the bar raises it's becoming less unusual for less experienced programmers to tackle such big architectures. Ofcourse, an experienced programmer will have the privilage of successfully completing such a project. Anyway, assuming you can forgive me....

Sounds like you are talking about a main memory database... what's the application..and more elaboration on the thinking behind your architecture.. Tell us a bit more.. I think that will help a bit..

Li-fan Chen
Friday, September 12, 2003

No problem. I was probably a little overly sensitive because it's already a blow to my overly puffed up ego to have to ask for advice ;)

Anyhow, I don't want to get bogged down in minutiae, but the basic application is for production scheduling/workflow.  There are three large data structures: a bill of materials (a treelike structure), a routing map (a network structure), and a rete network that is used to operate on these.  The application has several kinds of clients for doing things like asking questions about the system, entering data into the system, and running calculations on the system for things like what-if analysis and data summarization.

Earlier implementations of this pulled a lot of data to a work station and did most of the work there.  I'd like to deploy it much more widely than it is now, though, for a variety of reasons (thus the focus on the web).  I'd also like it if several people could connect and observe the data live so that they could work together on some of the what-if modelling.

Obviously, I could deploy a client application to do these things, but a web client would be nice for a variety of reasons.  Also, like I mentioned earlier, I have several approaches that I'm pretty sure would work, but I'm trying to avoid a really messy solution if I can.  What I'm mostly after is clean approaches to some of the problems I mentioned in the original posting.

Thanks again for the input on this stuff too.

three-tiered newbie
Friday, September 12, 2003

It originally sounded like you wanted to make your queries asynchronously, but your later posts make it sound like you're just concerned with paying a startup overhead for each request.  If that's it, you can run as a service or a COM+ application, or just force your dll to stay loaded (e.g. by storing a reference in the Application object).

Brian
Friday, September 12, 2003

The particular approach I am asking about basically amounts to a main memory network database containing on the order of 2 million objects.  The heavy structure makes it difficult to do some of the operations cleanly in an RDBMS.  I am no database expert, though, so that could be a factor.  In earlier client/server implementations, I would have the client pull a portion of the data over when the application started and work locally.  As things are getting more and more heavily shared, though, the scope of each operation is widening, so parceling up the data is not as easy as it once was.  The simplest thing is to just have one model with all the data instead of many models with most of the data.

My thinking is that each client request would be shaped into a command and put into a queue (so that I could serialize access against the shared data structures).  The data manager/calculation engine could pull each of these out of the queue and execute them against the data, producing a result.  This result would then find it's way back to the requestor, get formatted, and then be returned to the client.  For long running calculations (some of the scheduling operations can take a long time), I plan to use email alerts similar to the ones described above.

At this point I can't really say whether the application would be IO bound or CPU bound.  Some of the global calculations can be very CPU intensive, but they can also touch a large percentage of the objects in the system as well.  I'm hoping to do some experiments to uncover answers to questions like this, but I gotta have something to experiment with first ;)

three-tiered newbie
Friday, September 12, 2003

Brian,

Thanks for the tip.  I'll have to look into those techniques.  You are correct, part of my problem (since I am, as advertised, a three-tiered newbie) is that I can't pay the overhead for loading on each request.  Some else over here recommended looking at building a Windows service, too, so I think I will start there.  Thanks again.

three-tiered newbie
Friday, September 12, 2003

If you pulling the whole database into memory then you are CPU bound unless you generating a lot of page misses from limited RAM. 

Am I correct in assuming that two requests are dependant on each other?

christopher baus
Friday, September 12, 2003

Sorry to have been unclear.  Yes, the approach I am considering here will almost definitely be CPU bound, unless .NET objects are fatter than I suppose.  For the general problem, though, it is unclear to me which effect might dominate.

I'm not sure I understand your question about two requests being dependent on each other though.  For obvious reasons, a sequence of reads will be independent of each other since no state is changed by these operations.  Transient data used in calculations is associated with a sort of Visitor object so that the domain objects remain undisturbed.  Sequences involving writes may introduce a dependency, and almost certainly will for some of the more far reaching operations.  The current set of data structures allows us to limit the effect of writes to a private image of the data until the changes are merged back into the public image (this part is really nasty to do in the database).

three-tiered newbie
Friday, September 12, 2003

Memory requirements worries me...

2 million of objects could be a lot. Are you sure that you can manipulte all of them in reasonable amount of memory? I.e. if your object is something like 2-4 integers then it sounds reasonable, but if each has name, detail, whatever and is close to 100-1000 bytes you may require huge amount of memory on server. Also from your description it seems like you need almost whole new copy of this structures on each request... So if you end up even touching 1-2Gb of memory you unlikly will be able to serve more then one client alltogether... So I would recommend to estimate amount of memory needed.

Also make sure you understand all memory overhead of all code you are planning to use. Memory managment could kill your application for such a number of objects...

It could turn out that current implementation that is "do modelling on client" is more reasonable.

WildTiger
Friday, September 12, 2003

Thanks Wild Tiger,

Certainly the memory management will be a challenge.  I plan to do some experiments to see how the server behaves as I put it under a higher and higher memory load.  Also, I've got to study .NET to see what kind of overhead is associated with even an empty object. 

Maybe I'm barking up the wrong tree with this approach.  Who knows?  Might be time to start brushing up in C again :)

As far as the copying goes, the way the data structure is designed, you only need to copy a small portion of the shared structure for most operations.  It works similarly to the way some version control packages manage branches - basically copy what is changing and share what is not.  When you are done, you merge the original branch and the changed branch into a new branch, which then becomes the shared branch.  It's actually pretty neat.

three-tiered newbie
Saturday, September 13, 2003

An empty object has 8 bytes of overhead in .NET - a four byte vptr and a four byte sync block pointer. If the object's been used for locking, then there's also a sync block associated with the object.

Chris Tavares
Saturday, September 13, 2003

I know that you're using .NET, but some of the ideas you talk about (holding everything in memory - essentially caching on a large scale) sounds like the functionality provided by Prevayler - http://www.prevayler.org/

Prevayler stores everything in memory, so assuming that you have enough memory, performance is good (i.e. better than data access).

If you go with Prevayler (and therefore Java) you can (obviously) still provide a web-based client.

Other features you discuss (asynchronous processing) would be implementable using a messaging system and Java provides an API for this via JMS. Somnifugi - http://somnifugi.sourceforge.net/ - provides an open-source JMS implementation.

Walter Rumsby
Saturday, September 13, 2003

Chris,

Thanks for the information.  Doesn't sound nearly as bad as I'd expected, thankfully.

three-tiered newbie
Saturday, September 13, 2003

Walter,

Thanks for the tip.  I'll take a look at the product you've linked.  I'm mostly taking the .NET route because that's what everyone else where I work is using and I guess the company has an enterprise license for everything Microsoft.  I think, though, that if I could make all this work, they'd go for it.

three-tiered newbie
Saturday, September 13, 2003

Hey folks,

thanks for all the input.  I have decided to go back to the drawing board with this stuff and maybe try to rework the basic solution to the problem with an eye towards opening up more implementation options.  Seems like the thing that is killing me it the need to frequently operate globally, so I am going to think about ways to structure things so that I don't have to do this very often.  Thanks again for taking the time to reply.

three-tiered newbie
Sunday, September 14, 2003

*  Recent Topics

*  Fog Creek Home