Fog Creek Software
Discussion Board




Architecture Question

Ok, I've got an issue i've spent quite a while arguing with myself over it.  I'm designing a system where there will be N number of application servers, and one database server.  There is a list of items stored in the database, and the app servers will be responsible for pulling X number of items out at a given time, and then do some processor intensive calculations on the items.  Ok, so we've got the standard row locking transactions that make sure multiple servers arn't working on the same data.  The issue i'm wondering about is wether I should "chunk" the data up or not.  I COULD have each server take a share of the total items they're supposed to be processing on at a given time, but now I'm putting undue stress upon the database server.  When doing join heavy database queries on a large number of rows, it usually doens't matter if your returning 10,000 rows versus 40,000 rows, the slow down is the searching.    On the other hand, I don't know if its a good idea to possibly let one server try to load up a million rows or so in memory, while the other servers aren't stressed at all.  Does this problem have the "perfect solution"?  Is the answer "depends on your exact situation, just pick the least worse design"? 
Thoughts and comments always appreciated.

vince
Tuesday, August 12, 2003

I am not sure what your question is, can you elaborate please? Thanks! :-)

Li-fan Chen
Tuesday, August 12, 2003

I would design the application so that you could share the work acrosss multiple application servers. If you think your going to need extra performance down the road, it's a lot easier to vertically scale the database and horizontally scale the application servers then to have to do both. Additionally, the database might surprise you in that it is not a bottleneck, particularly if the app servers are spending a considerable amount of time performing "intensive calculations".

One thing to consider depending on your application complexity is to write a quick and dirty prototype or simulation and actually measure performance prior to making any architectural decisions. Too often people make assumptions about performance which turn out to be grossly wrong.

Gerald Nunn
Wednesday, August 13, 2003

Spreading the load on several servers would end up in having more complex code, potentially locking problems. Complex does not mean ugly though.
However it would scale better.
Be more specific?

Alexander Chalucov (www.alexlechuck.com)
Wednesday, August 13, 2003

ok, I realize my original post was somewhat confusing.  To clarify:  My question is how can I distribute the processing on a dataset among multiple servers?
  My first thought was to have each server take a chuck of the result set, first locking a certain number of rows, then executing the query and returning only a portion of the results.  (such as with the TOP or LIMIT keywords).  One of the problems I thought about with that approach though, is the fact that you'd have to issue the same query to the database multiple times, and when working with large tables, it could be a big slowdown.  Does that clarify things at all?

vince
Wednesday, August 13, 2003

I understand. Why don't you create the tables and write some code to fill some data in them, so that the datasets are as large as you expect. Then try to run your queries on the whole set and on a subset. Compare times. Try running it simultaneously. Measure performance. It should not take you too long.

Alexander Chalucov (www.alexlechuck.com)
Thursday, August 14, 2003

well, I guess i'll try that.  I was just curious if there was an established pattern for load balancing the work on a set of data pulled from the database. 

vince
Thursday, August 14, 2003

*  Recent Topics

*  Fog Creek Home