Fog Creek Software
g
Discussion Board




Database Connections

Hi:

We are planning on developing an application - currently in the discovery phase. I was wondering what is the best methodology to investigate if our database can handle the load?

The application will be an online resource for customers of our software.

We used a shared Oracle with a couple of other teams. For verifying users logging-in we have LDAP.

However all the database updates must go to a single Oracle. Our allocation of database resource is 126 processes concurrently.

We expect the potential user base could be in excess of 20 000. Currently we do not have a distribution pattern.

What would be the best way to gauge in advance if the database resource is sufficient? Should we consider using a database connection broker?

Ram Dass
Tuesday, February 3, 2004

What's the middle tier gonna be built from?

veal
Tuesday, February 3, 2004

Thank you for your post :)

We are using an all Java environment - JSP/Servlets on the front-end.

The middle tier will be classes written by us - we do not expect to use application servers (like jBoss) or EJBs.

Ram Dass
Tuesday, February 3, 2004

I agree with the previous poster. We need more information.

To keep the database processes to a minimum you could
run the database in "shared server mode" which means one database process will handle all incoming connections and the listener will not spawn a new oracle process per connection.

What web server are you going to be using? Will you be using Oracle iAS with the mod_plsql module for database connectivity?

Patrik
Tuesday, February 3, 2004

The database will be Oracle 8i

Ram Dass
Tuesday, February 3, 2004

Currently we do not plan to use the mod_plsql module. But this can change.

Ram Dass
Tuesday, February 3, 2004

Be sure to pool your database connections at the middle tier and you should be able to scale up pretty well.  20,000 users is nothing to sneeze at, but they'll likely have a lot of think-time, and the web tier will slow them down.  I'd guess that you've correctly identified the database as your most pressing scalability problem.  (Almost everything else you can band-aid over with even modest hardware dollars.)  Assuming that you'll not have really bad programming in the mid-tier, call latency to the database will likely be your worst bottleneck.

Something like Jakarta's DBCP is probably better than trying to write your own connection pool, but since you'll be managing your own transactions, you'll need to be really careful about ensuring that you don't return a connection to the pool while it's holding an open transaction from an aborted Java operation.  If you're not careful, the next thread using that connection may inadvertently commit database operations from the previous user.  (Honestly, I'm talking about a degree of care that average programmers simply don't exhibit.  Think hard on this.  Making your middle tier a simple EJB Stateless Session Bean layer might be worthwhile only for the easy transaction control the container offers.)

Another scalability problem you'll find is that many typical Java programming practices will reduce your concurrency in the middle tier.  (You'll find threads queued up waiting for shared resources that should never have been shared, like objects of the abominable Singleton pattern and its equally bad variants.)

The best advice I can give you is to first write very good tests to test every operation your middle tier offers to higher tiers.  Dedicate a few test machines to viciously batter the middle tier after every daily release of your code, profile it whenever the results get poor, and drive out your own concurrency bottlenecks.  (Be sure your test environment has a similar network configuration to your production environment, so you don't lose time tracking down a transient performance problem due to Fred downloading porn that day.)

Getting back to the database issue, if your performance is still not great, trace your SQL calls at the database to identify the highest frequency calls and try to coalesce multiple SQL executes into fewer individual roundtrips by judicial use of, for example, stored procedures.

Have fun.  Scaling up to 20K concurrent users is exciting.

veal
Tuesday, February 3, 2004

About all I'd add is to abstract your middle tier to database connection to use stored procedures for everything, don't rely on knowledge of the database structure. 

Maintain the state of sessions in the database not in the middle tier where Sessions M:1 connections.  That causes a bit more to'ing and fro'ing between middle tier and database for transactions but it simplifies the middle tier considerably.

If you cast the output to XML then it can be styled at the front end.  Whether this is a performance improvement is moot though.

Simon Lucy
Tuesday, February 3, 2004

*  Recent Topics

*  Fog Creek Home